Copilot commented on code in PR #16261:
URL: https://github.com/apache/lucene/pull/16261#discussion_r3415769402
##########
lucene/core/src/java/org/apache/lucene/analysis/CharArraySet.java:
##########
@@ -177,6 +203,7 @@ public String toString() {
sb.append(item);
}
}
- return sb.append(']').toString();
+ sb.append("](ignoreCase=").append(isIgnoreCase()).append(')');
+ return sb.toString();
Review Comment:
CharArraySet#toString() now always appends the ignoreCase flag, which
changes the output for the common ignoreCase=false case. This will break
existing tests/usages that expect the previous format, e.g.
TestCharArrayMap#testToString asserts cm.keySet().toString() == "[test]"
(keySet() is a CharArraySet). Consider preserving the legacy output when
ignoreCase=false and only appending the suffix when ignoreCase=true to minimize
compatibility fallout.
##########
lucene/core/src/test/org/apache/lucene/analysis/TestCharArraySet.java:
##########
@@ -373,8 +373,34 @@ public void testContainsWithNull() {
public void testToString() {
CharArraySet set = CharArraySet.copy(Collections.singleton("test"));
- assertEquals("[test]", set.toString());
+ assertEquals("[test](ignoreCase=false)", set.toString());
Review Comment:
If CharArraySet#toString() preserves the legacy output for ignoreCase=false
(recommended to avoid breaking other tests like TestCharArrayMap), then this
assertion should continue to expect the original "[test]" format for the
default case-sensitive set.
##########
lucene/core/src/java/org/apache/lucene/analysis/CharArraySet.java:
##########
@@ -166,6 +167,31 @@ public Iterator<Object> iterator() {
return map.originalKeySet().iterator();
}
+ /** Returns {@code true} if this set matches entries case-insensitively. */
+ public boolean isIgnoreCase() {
+ return map.isIgnoreCase();
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (o == this) return true;
+ if (!(o instanceof CharArraySet other)) return false;
+ if (isIgnoreCase() != other.isIgnoreCase()) return false;
+ if (size() != other.size()) return false;
+ return containsAll(other);
+ }
+
+ @Override
+ public int hashCode() {
+ int h = Boolean.hashCode(isIgnoreCase());
+ for (Object o : this) {
+ if (o instanceof char[] chars) {
+ h += Arrays.hashCode(chars);
+ }
+ }
+ return h;
+ }
Review Comment:
hashCode() currently iterates via `for (Object o : this)`, which goes
through AbstractMap#keySet() and clones each key char[] (see
CharArrayMap.MapEntry#getKey). Since CharArraySet is often used for stopword
sets, and the PR explicitly calls out using it as a map key, it’s worth
avoiding the per-key allocations by iterating directly over the backing
`map.keys` array.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]