rdblue commented on a change in pull request #3530:
URL: https://github.com/apache/iceberg/pull/3530#discussion_r748897765
##########
File path: core/src/main/java/org/apache/iceberg/deletes/Deletes.java
##########
@@ -261,7 +258,11 @@ public void close() {
@Override
protected boolean shouldKeep(T posDelete) {
- return CHARSEQ_COMPARATOR.compare(dataLocation, (CharSequence)
FILENAME_ACCESSOR.get(posDelete)) == 0;
+ return charSeqEquals(dataLocation, (CharSequence)
FILENAME_ACCESSOR.get(posDelete));
+ }
+
+ private boolean charSeqEquals(CharSequence s1, CharSequence s2) {
+ return s1 == s2 || (s1.length() == s2.length() &&
s1.toString().contentEquals(s2));
Review comment:
`contentEquals` is really just doing a slightly optimized version of
what the `CharSeqComparator` does.
The main difference is checking the length first, which is already done
here. Another difference is the `isHighSurrogate` checks, which are only needed
for ordering. I think we should avoid the extra high surrogate checks, but this
isn't worth the cost of converting to `String` every time, which is a really
expensive operation if `s1` isn't already a `String`. I suspect that the test
cases that showed a big difference used a `String` as the first argument, but
that this would perform really poorly (possibly even worse) with a non-String
`s1`.
Instead, let's add a more optimized implementation to check equality here.
```java
public boolean equal(CharSequence s1, CharSequence s2) {
if (s1 == s2) {
return true;
}
if (s1.length() != s2.length()) {
return false;
}
int len = s1.length();
for (int i = 0; i < len; i += 1) {
if (s1.charAt(i) != s2.charAt(i)) {
return false;
}
}
return true;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]