> 1. Parquet file format seems have index page [1], but I don't know who's

The INDEX_PAGE type a fascinating point -- I am not sure what benefit
writing indexes using that annotation would be 🤔

> Currently I don't know whether we can have some "offcial" sample index.

I am not sure examples need to be "official" -- I suspect people would be
interested in public open source examples of various types of indexes that
they could adapt to their own needs.

Andrew

On Wed, Jul 16, 2025 at 7:16 AM wish maple <maplewish...@gmail.com> wrote:

> Seems good. Personally I think
>
> 1. Parquet file format seems have index page [1], but I don't know who's
> using it.
> 2. Currently, Parquet only have single column bloom filter and column
> index. Maybe
>     some kind of multi-column or other filter might work
> 3. Index can have different "levels", like Page Index is designed for
> "Page", and bloom
>     filter / statistics for RowGroup. We can even define index for "file"
>
> Currently I don't know whether we can have some "offcial" sample index.
> Personally I
> might be interested in some "sketches"
>
> Best,
> Xuwei Fu
>
> [1]
>
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L655
>
> Andrew Lamb <andrewlam...@gmail.com> 于2025年7月16日周三 19:08写道:
>
> > I wrote a blog with Qi Zhu, Jigao Luo explaining how to embed user
> defined
> > indexes into Parquet files without needing any changes to the format[1].
> >
> > I am sorry for the somewhat shameless self promotion, but I think this
> > topic may be of general interest to the community in the context of other
> > extensions to the format we have discussed recently. Techniques such as
> > this widen potential usecases of  Parquet without any need for consensus
> or
> > timeline for ecosystem adoption.
> >
> > Andrew
> >
> > [1]:
> >
> https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/
> >
>

Reply via email to