Seems good. Personally I think 1. Parquet file format seems have index page [1], but I don't know who's using it. 2. Currently, Parquet only have single column bloom filter and column index. Maybe some kind of multi-column or other filter might work 3. Index can have different "levels", like Page Index is designed for "Page", and bloom filter / statistics for RowGroup. We can even define index for "file"
Currently I don't know whether we can have some "offcial" sample index. Personally I might be interested in some "sketches" Best, Xuwei Fu [1] https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L655 Andrew Lamb <andrewlam...@gmail.com> 于2025年7月16日周三 19:08写道: > I wrote a blog with Qi Zhu, Jigao Luo explaining how to embed user defined > indexes into Parquet files without needing any changes to the format[1]. > > I am sorry for the somewhat shameless self promotion, but I think this > topic may be of general interest to the community in the context of other > extensions to the format we have discussed recently. Techniques such as > this widen potential usecases of Parquet without any need for consensus or > timeline for ecosystem adoption. > > Andrew > > [1]: > https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/ >