findepi commented on a change in pull request #2849:
URL: https://github.com/apache/iceberg/pull/2849#discussion_r675023438
##########
File path: api/src/test/java/org/apache/iceberg/transforms/TestBucketing.java
##########
@@ -215,6 +215,24 @@ public void testString() {
hashBytes(asBytes), bucketFunc.hash(string));
}
+ @Test
+ public void testStringWithSurrogatePair() {
+ String string = "string with a surrogate pair: 💰";
+ Assert.assertNotEquals("string has no surrogate pairs", string.length(),
string.codePoints().count());
+ byte[] asBytes = string.getBytes(StandardCharsets.UTF_8);
+
+ Bucket<CharSequence> bucketFunc = Bucket.get(Types.StringType.get(), 100);
+
+ Assert.assertEquals("String hash should match hash of UTF-8 bytes",
+ hashBytes(asBytes), bucketFunc.hash(string));
+
+ Assert.assertNotEquals("It looks like Guava has been updated and now
contains a fix for " +
+ "https://github.com/google/guava/issues/5648. Please
resolve the TODO in BucketString.hash " +
+ "and remove this assertion",
+ hashBytes(asBytes),
+ MURMUR3.hashString(string, StandardCharsets.UTF_8).asInt());
Review comment:
eventually my IDE is not against me on this :)
fixed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]