Hi devs, I'd like to initiate a discussion on PIP-41: Introduce FilePath Global Index And Optimizations For Lookup In Append Table [1].
We use Paimon as cold storage for sample data of businesses such as search, recommendation, and advertising. Given the extremely large volume of sample data, we adopt Append Tables for data storage. Batch jobs read and process this sample data, while lookup by key capability for historical data is also required during the sample data processing. To support hybrid queries for such ultra-high-dimensional data in Paimon, we introduce the FilePath Global Index, along with optimizations for reading Parquet metadata and Parquet file data in Append Tables (Some of the major optimization designs come from our partner teams cc @lingpeng, @guanziyue, thx), aiming to enhance the lookup capability of Paimon Append Tables. Looking forward to hearing from you, thanks [1] https://cwiki.apache.org/confluence/display/PAIMON/PIP-41%3A+Introduce+FilePath+Global+Index+And+Optimizations+For+Lookup+In+Append+Table Best, Fang Yong
