Thanks Yong for driving this PIP! This PIP looks very nice!
I have two considerations: 1. I am currently working on introducing a SortLookupStoreFactory. My idea is to first practice using local lookup to clarify what file format we need. Only when we feel that the format is mature (the performance is fully OK), then we can determine the specific structure of the format. 2. This format may not be called HFile. If it is different from HFile, we can give it another name. What do you think? Best, Jingsong On Wed, Jul 17, 2024 at 9:48 AM Yong Fang <zjur...@gmail.com> wrote: > > Hi devs, > > I and LiMing would like to initiate a discussion on PIP-25: Introduce HFile > format for Paimon primary key table [1]. Currently, when Paimon requires > creating lookup tables for lookup joins in streaming processes, it reads > data from ORC/Parquet/Avro format files in HDFS/S3, converts records to > key-value format data, and writes them to disk. This process consumes a > substantial amount of time. > > We aim to introduce the hfile format into Paimon in order to reduce the > cost of creating lookup tables. Users can take advantage of this file > format for Paimon primary key tables when using Paimon as a lookup table. > In this case, Paimon will create lookup tables based on hfile files without > rebuilding key-value files. > > Looking forward to your feedback, thanks. > > [1] > https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+HFile+format+for+paimon+primary+key+table > > Best, > Fang Yong