LsomeYeah opened a new pull request, #5255: URL: https://github.com/apache/paimon/pull/5255
<!-- Please specify the module before the PR name: [core] ... or [flink] ... --> ### Purpose <!-- Linking this pull request to the issue --> Linked issue: close #xxx <!-- What is the purpose of the change --> Record-Level expiration happens in compaction, but some specific data files will not be rewritten even in full compaction which may leading some expired records can't be expired for a long time. This pr aims to ensure that record-expire takes effect when full compaction. When full compaction, the files with the following kinds will not be rewritten: - large file, file level is maxLevel and no overlap with other file: do nothing for the file - large file, outputLevel is not maxLevel or has no deleted records: directly upgrade file level to outputLevel without rewriting For these files, rewrite them if they contain expired records. And full compaction will pick no files for compaction in the following cases: - lsm tree is empty - only one maxLevel sorted run in lsm tree For only one maxLevel sorted run in lsm tree, pick the files containing expired records to perform full compaction. ### Tests <!-- List UT and IT cases to verify this change --> RecordLevelExpireTest#testIsExpireFile RecordLevelExpireTest#testTotallyExpire ### API and Format <!-- Does this change affect API or storage format --> ### Documentation <!-- Does this change introduce a new feature --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
