MonkeyCanCode commented on PR #1966:
URL: https://github.com/apache/polaris/pull/1966#issuecomment-3034493090

   Nice work on this piece. The approach looks good. Here are my two cents:
   1. I think the word "random" may be a bit misleading, as the underlying hash 
function `murmur3_32_fixed` is deterministic.
   2. Could `murmur3_32_fixed` introduce hash collisions, particularly within 
the 20 bits used for the prefix? For example, if we have two logically distinct 
tables, say `namespaceA.tableX` and `namespaceB.tableX`, and their full table 
identifiers (`"namespaceA.tableX"` and `"namespaceB.tableX"`) happen to 
generate the same 20-bit hash prefix, this would lead to their data being 
co-located under the same physical S3 directory for that prefix. While the full 
S3 path would still be unique due to the appended namespace/table name, is this 
a concern for the optimized sibling check or other aspects of data management?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to