Hi devs, LiMing and I would like to initiate a discussion on PIP-25: Introduce a key-value file format for Paimon primary key tables [1]. Currently, when Paimon requires creating lookup tables for lookup joins in streaming processes, it reads data from ORC/Parquet/Avro format files in HDFS/S3, converts records to key-value format data, and writes them to disk. This process consumes a substantial amount of time.
JingSong Lee has added a basic sort lookup store in Paimon, and we aim to introduce the new key-value file format into Paimon based on it in order to reduce the cost of creating lookup tables. Users can take advantage of this file format for Paimon primary key tables when using Paimon as a lookup table. In this case, Paimon will create lookup tables based on the sorted store files without rebuilding key-value files. Looking forward to your feedback, thanks. [1] https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+a+key-value+file+format+for+paimon+primary+key+table Best, Fang Yong