[PR] [core] Introduce a basic SortLookupStoreFactory [paimon]

via GitHub Wed, 17 Jul 2024 00:58:11 -0700


JingsongLi opened a new pull request, #3770:
URL: https://github.com/apache/paimon/pull/3770


   <!-- Please specify the module before the PR name: [core] ... or [flink] ... 
-->
   
   ### Purpose
   
   <!-- Linking this pull request to the issue -->
   Multiple experiments have shown that the performance bottleneck of local 
files during lookup lies in disk read and write, and file format affects file 
size. Generally speaking, sorting based implementations can achieve relatively 
high compression rates.
   
   This PR introduces a sorting implementation that is very similar to the file 
read and write implementation of levelDB, but with some differences:
   1. It does not consider prefix compression for keys, as `zstd` compression 
automatically considers these situations, and currently our key format (see 
`RowCompactedSerializer`) is not friendly to prefix compression.
   2. Introduced alignment index for Blocks, which can avoid storing the length 
of each key value.
   
   <!-- What is the purpose of the change -->
   
   ### Tests
   
   <!-- List UT and IT cases to verify this change -->
   
   ### API and Format
   
   <!-- Does this change affect API or storage format -->
   
   This PR has not yet fully implemented sorting based files, so it only 
provides an option and defaults to HASH implementation.
   
   ### Documentation
   
   <!-- Does this change introduce a new feature -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [core] Introduce a basic SortLookupStoreFactory [paimon]

Reply via email to