alamb commented on issue #489:
URL: https://github.com/apache/parquet-format/issues/489#issuecomment-2833438755

   In my opinion, adding new index-like structures to the parquet spec makes 
sense when a "large" number of engines will support writing and using them. 
   
   Today it is possible to use such index structures without changing the spec 
in at least two ways:
   1. Store the index in outside of the parquet files themselves (e.g. in a 
metadata store). Here is [an 
example](https://github.com/apache/datafusion/blob/74dc4196858784d7872b21bbfc97edc564e47c5e/datafusion-examples/examples/advanced_parquet_index.rs#L65-L77)
 of using an external index in Apache DataFusion
   2. Store the index in the user defined metadata (e.g [key/value 
metadata](https://github.com/apache/parquet-format/blob/3ce0760933b875bc8a11f5be0b883cd107b95b43/src/main/thrift/parquet.thrift#L900))
   
   My suggestion is to postpone any changes to the parquet spec until there are 
several engines that use this type of index with parquet already. 
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to