Thanks Yong and Ming! We can take action and improve the structure of sort file based on performance testing~
Strive to replace Hash File as the default option as soon as possible. Best, Jingsong On Mon, Jul 22, 2024 at 10:58 AM Yong Fang <[email protected]> wrote: > > Hi devs, > > LiMing and I would like to initiate a discussion on PIP-25: Introduce a > key-value file format for Paimon primary key tables [1]. Currently, when > Paimon requires creating lookup tables for lookup joins in streaming > processes, it reads data from ORC/Parquet/Avro format files in HDFS/S3, > converts records to key-value format data, and writes them to disk. This > process consumes a substantial amount of time. > > JingSong Lee has added a basic sort lookup store in Paimon, and we aim to > introduce the new key-value file format into Paimon based on it in order to > reduce the cost of creating lookup tables. Users can take advantage of this > file format for Paimon primary key tables when using Paimon as a lookup > table. In this case, Paimon will create lookup tables based on the sorted > store files without rebuilding key-value files. > > Looking forward to your feedback, thanks. > > [1] > https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+a+key-value+file+format+for+paimon+primary+key+table > > Best, > Fang Yong
