Hi devs,



I'm proposing a new feature to introduce range partitioning and sorting in 
append scalable table 

writing for Flink. The goal is to optimize query performance by reducing data 
scans on large datasets.




The proposal includes:




1. Configurable range partitioning and sorting during data writing which allows 
for 

a more efficient data distribution strategy.




2. Introduction of new configurations that will enable users to specify columns 
for 

comparison, choose a comparison algorithm for range partitioning, and further 
sort each 

partition if required.




3. Detailed explanation of the division of processing steps when range 
partitioning

is enabled and the conditional inclusion of the sorting phase. 




Looking forward to discussing this in the upcoming PIP [1].




Best regards,

Wencong Liu




[1] 
https://cwiki.apache.org/confluence/display/PAIMON/PIP-21%3A+Introduce+Range+Partition+And+Sort+in+Append+Scalable+Table+Batch+Writing+for+Flink

Reply via email to