Re: [PR] Fix CharArraySet equals/hashCode contract [lucene]

via GitHub Mon, 15 Jun 2026 11:45:23 -0700


Copilot commented on code in PR #16261:
URL: https://github.com/apache/lucene/pull/16261#discussion_r3415769402



##########
lucene/core/src/java/org/apache/lucene/analysis/CharArraySet.java:
##########
@@ -177,6 +203,7 @@ public String toString() {
         sb.append(item);
       }
     }
-    return sb.append(']').toString();
+    sb.append("](ignoreCase=").append(isIgnoreCase()).append(')');
+    return sb.toString();

Review Comment:
   CharArraySet#toString() now always appends the ignoreCase flag, which 
changes the output for the common ignoreCase=false case. This will break 
existing tests/usages that expect the previous format, e.g. 
TestCharArrayMap#testToString asserts cm.keySet().toString() == "[test]" 
(keySet() is a CharArraySet). Consider preserving the legacy output when 
ignoreCase=false and only appending the suffix when ignoreCase=true to minimize 
compatibility fallout.



##########
lucene/core/src/test/org/apache/lucene/analysis/TestCharArraySet.java:
##########
@@ -373,8 +373,34 @@ public void testContainsWithNull() {
 
   public void testToString() {
     CharArraySet set = CharArraySet.copy(Collections.singleton("test"));
-    assertEquals("[test]", set.toString());
+    assertEquals("[test](ignoreCase=false)", set.toString());

Review Comment:
   If CharArraySet#toString() preserves the legacy output for ignoreCase=false 
(recommended to avoid breaking other tests like TestCharArrayMap), then this 
assertion should continue to expect the original "[test]" format for the 
default case-sensitive set.



##########
lucene/core/src/java/org/apache/lucene/analysis/CharArraySet.java:
##########
@@ -166,6 +167,31 @@ public Iterator<Object> iterator() {
     return map.originalKeySet().iterator();
   }
 
+  /** Returns {@code true} if this set matches entries case-insensitively. */
+  public boolean isIgnoreCase() {
+    return map.isIgnoreCase();
+  }
+
+  @Override
+  public boolean equals(Object o) {
+    if (o == this) return true;
+    if (!(o instanceof CharArraySet other)) return false;
+    if (isIgnoreCase() != other.isIgnoreCase()) return false;
+    if (size() != other.size()) return false;
+    return containsAll(other);
+  }
+
+  @Override
+  public int hashCode() {
+    int h = Boolean.hashCode(isIgnoreCase());
+    for (Object o : this) {
+      if (o instanceof char[] chars) {
+        h += Arrays.hashCode(chars);
+      }
+    }
+    return h;
+  }

Review Comment:
   hashCode() currently iterates via `for (Object o : this)`, which goes 
through AbstractMap#keySet() and clones each key char[] (see 
CharArrayMap.MapEntry#getKey). Since CharArraySet is often used for stopword 
sets, and the PR explicitly calls out using it as a map key, it’s worth 
avoiding the per-key allocations by iterating directly over the backing 
`map.keys` array.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix CharArraySet equals/hashCode contract [lucene]

Reply via email to