steFaiz opened a new pull request, #6796: URL: https://github.com/apache/paimon/pull/6796
<!-- Please specify the module before the PR name: [core] ... or [flink] ... --> ### Purpose The linked issue: https://github.com/apache/paimon/issues/6734 This PR is about to bring SST File VERSION 1 > NOTE: current SST File do not have a VERSION, so this version is not compatible with current SST File Format (At least it's hard to do so) #### Core improvements 1. Introduce leveled data index: user can specify a `MaxIndexBlockSize`, if the index block memory exceeds this threshold, the index block will be spilled to SST File as a B-Tree like structure. The reader will only load the root index on opening. 2. Introduce a new FileInfo block, containing some stats and users are free to add new k-v pairs. 3. Modify the footer structure, including: 1. add some stats such as uncompressed data size, uncompressed index size, row count and more 2. compression type is moved from BlockTrailer. This follows Hbase's design, so that we do not have to create a compressionFactory as well as a decompressor for each block 3. Add a VERSION number. <!-- Linking this pull request to the issue --> <!-- What is the purpose of the change --> ### Tests Please see * `org.apache.paimon.sst.IndexTest` for index test * `org.apache.paimon.sst.SstFileTest` for file test * `org.apache.paimon.lookup.sort.SortLookupStoreFactoryTest` for lookup store test <!-- List UT and IT cases to verify this change --> ### API and Format This pr do not change any public api <!-- Does this change affect API or storage format --> ### Documentation todo <!-- Does this change introduce a new feature --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
