Re: [DISCUSS] PIP-41: Introduce FilePath Global Index And Optimizations For Lookup In Append Table

Yong Fang Wed, 14 Jan 2026 22:57:47 -0800

Hi Jingsong,

I have separated PIP-41 into PIP-41: Introduce FilePath Global Index And
Optimizations For Lookup In Append Table [1] and PIP-42: Local Sort And
Parquet Lookup Optimizations For In Append Table [2], google docs are
https://docs.google.com/document/d/1qV7lUW5GsZ72IuF9FV96fX3wtJO_gzEWY6aLG97caxU/edit?tab=t.0
and
https://docs.google.com/document/d/1M6R5V6zRwlj0LeuKnhZcUK4Ss-9JLs9pF42n822k-Bo/edit?tab=t.0


[1]
https://cwiki.apache.org/confluence/display/PAIMON/PIP-41%3A+Introduce+FilePath+Global+Index+And+Optimizations+For+Lookup+In+Append+Table
[2]
https://cwiki.apache.org/confluence/display/PAIMON/PIP-42%3A+Local+Sort+And+Parquet+Lookup+Optimizations+For+In+Append+Table

Best,
FangYong

On Wed, Jan 14, 2026 at 8:31 PM lei li <[email protected]> wrote:

> Hi Yong,
>
>
> Thank you for sharing this fascinating PIP proposal!
>
>
> I'm particularly intrigued by the sort buffer concept for Append tables.
> I'd like to ask: does the current sorting process only support multi-field
> sequential sorting, or would it be possible to introduce more advanced
> sorting strategies, such as Z-order sorting or other space-filling curve
> algorithms?
>
>
> The reason I'm curious about this is that in high-volume batch write
> scenarios, having the data already sorted upon completion of the batch
> write could significantly improve query efficiency. Z-order sorting, for
> instance, could provide better data locality for multi-dimensional range
> queries, which is quite common in analytical workloads.
>
>
> Would love to hear your thoughts on whether such sorting enhancements
> align with the current design goals! Looking forward to your insights and
> the continued development of this proposal!
>
>
> Best regards,
>
> Lei Li
>
> 2026年1月13日 17:32，Yong Fang <[email protected]> 写道：
>
> Hi devs,
>
> I'd like to initiate a discussion on PIP-41: Introduce FilePath Global
> Index And Optimizations For Lookup In Append Table [1].
>
> We use Paimon as cold storage for sample data of businesses such as search,
> recommendation, and advertising. Given the extremely large volume of sample
> data, we adopt Append Tables for data storage.
>
> Batch jobs read and process this sample data, while lookup by key
> capability for historical data is also required during the sample data
> processing. To support hybrid queries for such ultra-high-dimensional data
> in Paimon, we introduce the FilePath Global Index, along with optimizations
> for reading Parquet metadata and Parquet file data in Append Tables (Some
> of the major optimization designs come from our partner teams cc @lingpeng,
> @guanziyue, thx), aiming to enhance the lookup capability of Paimon Append
> Tables.
>
> Looking forward to hearing from you, thanks
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-41%3A+Introduce+FilePath+Global+Index+And+Optimizations+For+Lookup+In+Append+Table
>
> Best,
> Fang Yong
>
>

Re: [DISCUSS] PIP-41: Introduce FilePath Global Index And Optimizations For Lookup In Append Table

Reply via email to