On Thu, Jul 17, 2025 at 8:18 AM Andrew Lamb <andrewlam...@gmail.com> wrote:

> > 1. Parquet file format seems have index page [1], but I don't know who's
>
> The INDEX_PAGE type a fascinating point -- I am not sure what benefit
> writing indexes using that annotation would be 🤔
>
> > Currently I don't know whether we can have some "offcial" sample index.
>
> I am not sure examples need to be "official" -- I suspect people would be
> interested in public open source examples of various types of indexes that
> they could adapt to their own needs.
>

Having at least a central location with namespaced index names and links to
the description of their implementation would help more libraries leverage
these indexes. Because the lack of standardization means only the libraries
that produced the files can leverage them. Which defeats the purpose of
using an open format somewhat.

--
Felipe


> Andrew
>
> On Wed, Jul 16, 2025 at 7:16 AM wish maple <maplewish...@gmail.com> wrote:
>
> > Seems good. Personally I think
> >
> > 1. Parquet file format seems have index page [1], but I don't know who's
> > using it.
> > 2. Currently, Parquet only have single column bloom filter and column
> > index. Maybe
> >     some kind of multi-column or other filter might work
> > 3. Index can have different "levels", like Page Index is designed for
> > "Page", and bloom
> >     filter / statistics for RowGroup. We can even define index for "file"
> >
> > Currently I don't know whether we can have some "offcial" sample index.
> > Personally I
> > might be interested in some "sketches"
> >
> > Best,
> > Xuwei Fu
> >
> > [1]
> >
> >
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L655
> >
> > Andrew Lamb <andrewlam...@gmail.com> 于2025年7月16日周三 19:08写道:
> >
> > > I wrote a blog with Qi Zhu, Jigao Luo explaining how to embed user
> > defined
> > > indexes into Parquet files without needing any changes to the
> format[1].
> > >
> > > I am sorry for the somewhat shameless self promotion, but I think this
> > > topic may be of general interest to the community in the context of
> other
> > > extensions to the format we have discussed recently. Techniques such as
> > > this widen potential usecases of  Parquet without any need for
> consensus
> > or
> > > timeline for ecosystem adoption.
> > >
> > > Andrew
> > >
> > > [1]:
> > >
> >
> https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/
> > >
> >
>

Reply via email to