adutra opened a new issue, #3621: URL: https://github.com/apache/polaris/issues/3621
### Is your feature request related to a problem? Please describe. ### Summary The `DEFAULT_LOCATION_OBJECT_STORAGE_PREFIX_ENABLED` feature configuration could benefit from improved documentation to clarify its purpose, limitations, and relationship with Iceberg's `write.object-storage.enabled` feature. ### Background Polaris has a feature config called `DEFAULT_LOCATION_OBJECT_STORAGE_PREFIX_ENABLED` introduced in bd8325208675c2b6505888cdd12d2c5abaa8dd2a. The feature name and current description may lead users to believe it provides similar functionality to Iceberg's [Object Store File Layout](https://iceberg.apache.org/docs/1.10.0/aws/#object-store-file-layout), but the two features work at different levels and are designed to be complementary. ### How the features differ **Iceberg's `write.object-storage.enabled`** applies entropy (hash-based prefix) on a *per-file* basis. Each file gets a unique hash prefix. **Polaris's `DEFAULT_LOCATION_OBJECT_STORAGE_PREFIX_ENABLED`** applies entropy *once per table*, based on the table identifier. All files in the same table share the same hash. ### Example Consider two data files in a table `newdb.newtable`: **Standard layout (no entropy):** ``` s3://bucket/warehouse/newdb/newtable/data/file1.parquet s3://bucket/warehouse/newdb/newtable/data/file2.parquet ``` **With Iceberg's object store layout only** (per-file entropy): ``` s3://bucket/warehouse/newdb/newtable/data/0011/0100/1011/11101010/file1.parquet s3://bucket/warehouse/newdb/newtable/data/0011/0001/0001/00000001/file2.parquet ``` **With Polaris's object storage prefix only** (per-table entropy): ``` s3://bucket/warehouse/1111/1111/0100/01010000/newdb/newtable/data/file1.parquet s3://bucket/warehouse/1111/1111/0100/01010000/newdb/newtable/data/file2.parquet ``` **With both features combined**: ``` s3://bucket/warehouse/1111/1111/0100/01010000/newdb/newtable/data/0011/0100/1011/11101010/file1.parquet s3://bucket/warehouse/1111/1111/0100/01010000/newdb/newtable/data/0011/0001/0001/00000001/file2.parquet ``` ### Describe the solution you'd like The documentation should clarify: 1. Purpose: Polaris's layout distributes *different tables* across the key space, preventing hotspots when multiple tables in the same namespace are accessed concurrently. It does *not* distribute files within a single table. 2. Limitations: since all files in a table share the same prefix, this layout alone does not prevent hotspots when a single table receives heavy write traffic. 3. Complementary usage: ss stated in the original commit, "The two features can and should be combined to achieve the best distribution of data files throughout the key space." ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
