JingsongLi opened a new pull request, #3770: URL: https://github.com/apache/paimon/pull/3770
<!-- Please specify the module before the PR name: [core] ... or [flink] ... --> ### Purpose <!-- Linking this pull request to the issue --> Multiple experiments have shown that the performance bottleneck of local files during lookup lies in disk read and write, and file format affects file size. Generally speaking, sorting based implementations can achieve relatively high compression rates. This PR introduces a sorting implementation that is very similar to the file read and write implementation of levelDB, but with some differences: 1. It does not consider prefix compression for keys, as `zstd` compression automatically considers these situations, and currently our key format (see `RowCompactedSerializer`) is not friendly to prefix compression. 2. Introduced alignment index for Blocks, which can avoid storing the length of each key value. <!-- What is the purpose of the change --> ### Tests <!-- List UT and IT cases to verify this change --> ### API and Format <!-- Does this change affect API or storage format --> This PR has not yet fully implemented sorting based files, so it only provides an option and defaults to HASH implementation. ### Documentation <!-- Does this change introduce a new feature --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
