danielhumanmod opened a new pull request, #1508:
URL: https://github.com/apache/polaris/pull/1508
### Motivation
As a follow-up PR for #312
Previously, when DROP TABLE PURGE was issued, Polaris cleaned up data files,
manifest files, and metadata files, but did not clean up partition-level
statistics files.
### Current Behavior
Partition statistics files (partition_stats) remain in storage after the
table is dropped. These files are listed in the TableMetadata but were not
included in the batch deletion task, resulting in orphaned files.
### Changes Introduced
- Added support for including partitionStatisticsFiles from TableMetadata in
the batch cleanup task (BatchFileCleanupTaskHandler).
- Updated `getMetadataFileBatches()` to collect and batch partition
statistics files for deletion.
- Added test coverage in `TableCleanupTaskHandlerTest` and
`BatchFileCleanupTaskHandlerTest` to verify:
- partitionStats files are scheduled for deletion
- they are correctly deleted by the task handler
### Desired Outcome
After a DROP TABLE PURGE, all Iceberg table metadata including
partition-level statistics are cleaned up as expected.
<!--
Possible security vulnerabilities: STOP here and contact
[email protected] instead!
Please update the title of the PR with a meaningful message - do not
leave it "empty" or "generated"
Please update this summary field:
The summary should cover these topics, if applicable:
* the motivation for the change
* a description of the status quo, for example the current behavior
* the desired behavior
* etc
PR checklist:
- Do a self-review of your code before opening a pull request
- Make sure that there's good test coverage for the changes included in
this PR
- Run tests locally before pushing a PR (./gradlew check)
- Code should have comments where applicable. Particularly
hard-to-understand
areas deserve good in-line documentation.
- Include changes and enhancements to the documentation (in
site/content/in-dev/unreleased)
- For Work In Progress Pull Requests, please use the Draft PR feature.
Make sure to add the information BELOW this comment.
Everything in this comment will NOT be added to the PR description.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]