Zoltán Borók-Nagy created IMPALA-14623:
------------------------------------------

             Summary: Use the raw bytes of the 128-bit Murmur hash of Iceberg 
file paths
                 Key: IMPALA-14623
                 URL: https://issues.apache.org/jira/browse/IMPALA-14623
             Project: IMPALA
          Issue Type: Improvement
          Components: Catalog, Frontend
            Reporter: Zoltán Borók-Nagy


Currently we use the following method to store the hash of an Iceberg file path:
 
In IcebergUtil:
{noformat}
public static String getFilePathHash(String path) {
  Hasher hasher = Hashing.murmur3_128().newHasher();
  hasher.putUnencodedChars(path);
  return hasher.hash().toString();
}{noformat}
There are 16 raw bytes, but the String representation stores it on 2 * 16 = 32 
characters. And a character in java is 2 bytes. So it consumes 2 * 32 = 64 
bytes which is 4 times more than needed.

For tables with large number of files this can cause a significant overhead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to