Hi Jingsong, I have separated PIP-41 into PIP-41: Introduce FilePath Global Index And Optimizations For Lookup In Append Table [1] and PIP-42: Local Sort And Parquet Lookup Optimizations For In Append Table [2], google docs are https://docs.google.com/document/d/1qV7lUW5GsZ72IuF9FV96fX3wtJO_gzEWY6aLG97caxU/edit?tab=t.0 and https://docs.google.com/document/d/1M6R5V6zRwlj0LeuKnhZcUK4Ss-9JLs9pF42n822k-Bo/edit?tab=t.0
[1] https://cwiki.apache.org/confluence/display/PAIMON/PIP-41%3A+Introduce+FilePath+Global+Index+And+Optimizations+For+Lookup+In+Append+Table [2] https://cwiki.apache.org/confluence/display/PAIMON/PIP-42%3A+Local+Sort+And+Parquet+Lookup+Optimizations+For+In+Append+Table Best, FangYong On Wed, Jan 14, 2026 at 8:31 PM lei li <[email protected]> wrote: > Hi Yong, > > > Thank you for sharing this fascinating PIP proposal! > > > I'm particularly intrigued by the sort buffer concept for Append tables. > I'd like to ask: does the current sorting process only support multi-field > sequential sorting, or would it be possible to introduce more advanced > sorting strategies, such as Z-order sorting or other space-filling curve > algorithms? > > > The reason I'm curious about this is that in high-volume batch write > scenarios, having the data already sorted upon completion of the batch > write could significantly improve query efficiency. Z-order sorting, for > instance, could provide better data locality for multi-dimensional range > queries, which is quite common in analytical workloads. > > > Would love to hear your thoughts on whether such sorting enhancements > align with the current design goals! Looking forward to your insights and > the continued development of this proposal! > > > Best regards, > > Lei Li > > 2026年1月13日 17:32,Yong Fang <[email protected]> 写道: > > Hi devs, > > I'd like to initiate a discussion on PIP-41: Introduce FilePath Global > Index And Optimizations For Lookup In Append Table [1]. > > We use Paimon as cold storage for sample data of businesses such as search, > recommendation, and advertising. Given the extremely large volume of sample > data, we adopt Append Tables for data storage. > > Batch jobs read and process this sample data, while lookup by key > capability for historical data is also required during the sample data > processing. To support hybrid queries for such ultra-high-dimensional data > in Paimon, we introduce the FilePath Global Index, along with optimizations > for reading Parquet metadata and Parquet file data in Append Tables (Some > of the major optimization designs come from our partner teams cc @lingpeng, > @guanziyue, thx), aiming to enhance the lookup capability of Paimon Append > Tables. > > Looking forward to hearing from you, thanks > > > [1] > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-41%3A+Introduce+FilePath+Global+Index+And+Optimizations+For+Lookup+In+Append+Table > > Best, > Fang Yong > >
