Hi devs,

LiMing and I would like to initiate a discussion on PIP-25: Introduce a
key-value file format for Paimon primary key tables [1]. Currently, when
Paimon requires creating lookup tables for lookup joins in streaming
processes, it reads data from ORC/Parquet/Avro format files in HDFS/S3,
converts records to key-value format data, and writes them to disk. This
process consumes a substantial amount of time.

JingSong Lee has added a basic sort lookup store in Paimon, and we aim to
introduce the new key-value file format into Paimon based on it in order to
reduce the cost of creating lookup tables. Users can take advantage of this
file format for Paimon primary key tables when using Paimon as a lookup
table. In this case, Paimon will create lookup tables based on the sorted
store files without rebuilding key-value files.

Looking forward to your feedback, thanks.

[1]
https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+a+key-value+file+format+for+paimon+primary+key+table

Best,
Fang Yong

Reply via email to