Hi, It was discovered by @Mateusz Gajewski <mateusz.gajew...@starburstdata.com> that Iceberg bucketing transformation for string isn't regular Murmur3 32-bit hash.
Upon closer investigation we found out that the code: https://github.com/apache/iceberg/blob/0c50b2074cd5dad59bbcb4b4599ec3ae11a34b49/api/src/main/java/org/apache/iceberg/transforms/Bucket.java#L239 is affected by Guava issue https://github.com/google/guava/issues/5648 that causes wrong results for input containing surrogate pairs (Unicode codepooints outside of Basic Multilingual Plane). Assuming it's indeed a bug and it gets fixed (I posted a PR to Guava with the proposed fix), this can cause incorrect query results, since bucketing function definition will effectively change. This is mostly FYI, unless we can do something more about it. Best PF