I'm curious about which part of the current design is the same with HBase's HFile and which are different, and would suggest adding some description in the document [1].
nit: I notice there is a link to hbase book [2] at the bottom but didn't find any reference in the PIP doc. Best Regards, Yu [1] https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+a+key-value+file+format+for+paimon+primary+key+table [2] https://hbase.apache.org/book.html#_hfile_format_2 On Wed, 17 Jul 2024 at 10:42, Jingsong Li <jingsongl...@gmail.com> wrote: > Thanks Yong for driving this PIP! > > This PIP looks very nice! > > I have two considerations: > > 1. I am currently working on introducing a SortLookupStoreFactory. My > idea is to first practice using local lookup to clarify what file > format we need. Only when we feel that the format is mature (the > performance is fully OK), then we can determine the specific structure > of the format. > > 2. This format may not be called HFile. If it is different from HFile, > we can give it another name. > > What do you think? > > Best, > Jingsong > > On Wed, Jul 17, 2024 at 9:48 AM Yong Fang <zjur...@gmail.com> wrote: > > > > Hi devs, > > > > I and LiMing would like to initiate a discussion on PIP-25: Introduce > HFile > > format for Paimon primary key table [1]. Currently, when Paimon requires > > creating lookup tables for lookup joins in streaming processes, it reads > > data from ORC/Parquet/Avro format files in HDFS/S3, converts records to > > key-value format data, and writes them to disk. This process consumes a > > substantial amount of time. > > > > We aim to introduce the hfile format into Paimon in order to reduce the > > cost of creating lookup tables. Users can take advantage of this file > > format for Paimon primary key tables when using Paimon as a lookup table. > > In this case, Paimon will create lookup tables based on hfile files > without > > rebuilding key-value files. > > > > Looking forward to your feedback, thanks. > > > > [1] > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+HFile+format+for+paimon+primary+key+table > > > > Best, > > Fang Yong >