Copilot commented on code in PR #16261:
URL: https://github.com/apache/lucene/pull/16261#discussion_r3430348822
##########
lucene/core/src/java/org/apache/lucene/analysis/CharArraySet.java:
##########
@@ -166,6 +167,31 @@ public Iterator<Object> iterator() {
return map.originalKeySet().iterator();
}
+ /** Returns {@code true} if this set matches entries case-insensitively. */
+ public boolean isIgnoreCase() {
+ return map.isIgnoreCase();
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (o == this) return true;
+ if (!(o instanceof CharArraySet other)) return false;
+ if (isIgnoreCase() != other.isIgnoreCase()) return false;
+ if (size() != other.size()) return false;
+ return containsAll(other);
+ }
+
+ @Override
+ public int hashCode() {
+ int h = Boolean.hashCode(isIgnoreCase());
+ for (char[] key : map.keys) {
+ if (key != null) {
+ h += Arrays.hashCode(key);
+ }
+ }
Review Comment:
`hashCode()` iterates over `map.keys` (the backing hash table array), which
makes the runtime proportional to the table capacity rather than the number of
elements. This can be significantly more expensive for sparse tables. Prefer
iterating only over present keys (e.g., via `map.originalKeySet()` / the set
iterator) to keep `hashCode()` closer to O(size).
##########
lucene/core/src/test/org/apache/lucene/analysis/TestCharArraySet.java:
##########
@@ -373,8 +373,34 @@ public void testContainsWithNull() {
public void testToString() {
CharArraySet set = CharArraySet.copy(Collections.singleton("test"));
- assertEquals("[test]", set.toString());
+ assertEquals("[test](ignoreCase=false)", set.toString());
set.add("test2");
assertTrue(set.toString().contains(", "));
+
+ CharArraySet ignoreCase = new CharArraySet(Collections.singleton("test"),
true);
+ assertEquals("[test](ignoreCase=true)", ignoreCase.toString());
+ }
+
+ public void testEqualsAndHashCode_sameContentSameIgnoreCase() {
+ for (boolean ignoreCase : new boolean[] {false, true}) {
+ CharArraySet a = new CharArraySet(Arrays.asList(TEST_STOP_WORDS),
ignoreCase);
+ CharArraySet b = new CharArraySet(Arrays.asList(TEST_STOP_WORDS),
ignoreCase);
+ assertNotSame(a, b);
+ assertEquals(a, b);
+ assertEquals(a.hashCode(), b.hashCode());
+ }
+ }
Review Comment:
The new coverage verifies same-content/same-mode equality, but it doesn’t
cover the key contract scenario for `ignoreCase=true` where the two sets
contain the same terms with different casing (e.g., `[\"Hund\"]` vs
`[\"hund\"]`). Add a test asserting `equals()` and `hashCode()` are consistent
in that case to prevent regressions in the ignore-case hashing behavior.
##########
lucene/core/src/java/org/apache/lucene/analysis/CharArraySet.java:
##########
@@ -166,6 +167,31 @@ public Iterator<Object> iterator() {
return map.originalKeySet().iterator();
}
+ /** Returns {@code true} if this set matches entries case-insensitively. */
+ public boolean isIgnoreCase() {
+ return map.isIgnoreCase();
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (o == this) return true;
+ if (!(o instanceof CharArraySet other)) return false;
+ if (isIgnoreCase() != other.isIgnoreCase()) return false;
+ if (size() != other.size()) return false;
+ return containsAll(other);
+ }
+
+ @Override
+ public int hashCode() {
+ int h = Boolean.hashCode(isIgnoreCase());
+ for (char[] key : map.keys) {
+ if (key != null) {
+ h += Arrays.hashCode(key);
+ }
+ }
+ return h;
+ }
Review Comment:
For `ignoreCase=true`, `equals()` is case-insensitive, but `hashCode()`
currently uses `Arrays.hashCode(key)` which is case-sensitive for `char[]`. Two
sets that are equal under case-insensitive matching (e.g., containing
`\"Test\"` vs `\"test\"`) can end up with different hash codes, violating the
`equals`/`hashCode` contract. Compute each element’s hash using the same
case-folding logic used by `CharArrayMap` for ignore-case matching (i.e., a
case-insensitive hash of the character content) so `hashCode()` aligns with
`equals()`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]