rdblue commented on a change in pull request #3530:
URL: https://github.com/apache/iceberg/pull/3530#discussion_r748897765



##########
File path: core/src/main/java/org/apache/iceberg/deletes/Deletes.java
##########
@@ -261,7 +258,11 @@ public void close() {
 
     @Override
     protected boolean shouldKeep(T posDelete) {
-      return CHARSEQ_COMPARATOR.compare(dataLocation, (CharSequence) 
FILENAME_ACCESSOR.get(posDelete)) == 0;
+      return charSeqEquals(dataLocation, (CharSequence) 
FILENAME_ACCESSOR.get(posDelete));
+    }
+
+    private boolean charSeqEquals(CharSequence s1, CharSequence s2) {
+      return s1 == s2 || (s1.length() == s2.length() && 
s1.toString().contentEquals(s2));

Review comment:
       `contentEquals` is really just doing a slightly optimized version of 
what the `CharSeqComparator` does.
   
   The main difference is checking the length first, which is already done 
here. Another difference is the `isHighSurrogate` checks, which are only needed 
for ordering. I think we should avoid the extra high surrogate checks, but this 
isn't worth the cost of converting to `String` every time, which is a really 
expensive operation if `s1` isn't already a `String`. I suspect that the test 
cases that showed a big difference used a `String` as the first argument, but 
that this would perform really poorly (possibly even worse) with a non-String 
`s1`.
   
   Instead, let's add a more optimized implementation to check equality here.
   
   ```java
       public boolean equal(CharSequence s1, CharSequence s2) {
         if (s1 == s2) {
           return true;
         }
   
         if (s1.length() != s2.length()) {
           return false;
         }
   
         int len = s1.length();
         for (int i = 0; i < len; i += 1) {
           if (s1.charAt(i) != s2.charAt(i)) {
             return false;
           }
         }
   
         return true;
       }
   ```
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to