Hi devs, I and LiMing would like to initiate a discussion on PIP-25: Introduce HFile format for Paimon primary key table [1]. Currently, when Paimon requires creating lookup tables for lookup joins in streaming processes, it reads data from ORC/Parquet/Avro format files in HDFS/S3, converts records to key-value format data, and writes them to disk. This process consumes a substantial amount of time.
We aim to introduce the hfile format into Paimon in order to reduce the cost of creating lookup tables. Users can take advantage of this file format for Paimon primary key tables when using Paimon as a lookup table. In this case, Paimon will create lookup tables based on hfile files without rebuilding key-value files. Looking forward to your feedback, thanks. [1] https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+HFile+format+for+paimon+primary+key+table Best, Fang Yong
