Hi devs,

I and LiMing would like to initiate a discussion on PIP-25: Introduce HFile
format for Paimon primary key table [1]. Currently, when Paimon requires
creating lookup tables for lookup joins in streaming processes, it reads
data from ORC/Parquet/Avro format files in HDFS/S3, converts records to
key-value format data, and writes them to disk. This process consumes a
substantial amount of time.

We aim to introduce the hfile format into Paimon in order to reduce the
cost of creating lookup tables. Users can take advantage of this file
format for Paimon primary key tables when using Paimon as a lookup table.
In this case, Paimon will create lookup tables based on hfile files without
rebuilding key-value files.

Looking forward to your feedback, thanks.

[1]
https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+HFile+format+for+paimon+primary+key+table

Best,
Fang Yong

Reply via email to