I'm curious about which part of the current design is the same with HBase's
HFile and which are different, and would suggest adding some description in
the document [1].

nit: I notice there is a link to hbase book [2] at the bottom but didn't
find any reference in the PIP doc.

Best Regards,
Yu

[1]
https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+a+key-value+file+format+for+paimon+primary+key+table
[2] https://hbase.apache.org/book.html#_hfile_format_2


On Wed, 17 Jul 2024 at 10:42, Jingsong Li <jingsongl...@gmail.com> wrote:

> Thanks Yong for driving this PIP!
>
> This PIP looks very nice!
>
> I have two considerations:
>
> 1. I am currently working on introducing a SortLookupStoreFactory. My
> idea is to first practice using local lookup to clarify what file
> format we need. Only when we feel that the format is mature (the
> performance is fully OK), then we can determine the specific structure
> of the format.
>
> 2. This format may not be called HFile. If it is different from HFile,
> we can give it another name.
>
> What do you think?
>
> Best,
> Jingsong
>
> On Wed, Jul 17, 2024 at 9:48 AM Yong Fang <zjur...@gmail.com> wrote:
> >
> > Hi devs,
> >
> > I and LiMing would like to initiate a discussion on PIP-25: Introduce
> HFile
> > format for Paimon primary key table [1]. Currently, when Paimon requires
> > creating lookup tables for lookup joins in streaming processes, it reads
> > data from ORC/Parquet/Avro format files in HDFS/S3, converts records to
> > key-value format data, and writes them to disk. This process consumes a
> > substantial amount of time.
> >
> > We aim to introduce the hfile format into Paimon in order to reduce the
> > cost of creating lookup tables. Users can take advantage of this file
> > format for Paimon primary key tables when using Paimon as a lookup table.
> > In this case, Paimon will create lookup tables based on hfile files
> without
> > rebuilding key-value files.
> >
> > Looking forward to your feedback, thanks.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+HFile+format+for+paimon+primary+key+table
> >
> > Best,
> > Fang Yong
>

Reply via email to