hussein-awala opened a new pull request, #7041:
URL: https://github.com/apache/hudi/pull/7041
### Change Logs
When the clean planner lists the files in the partitions and it doesn't find
any file to delete, the clean operation is skipped without any commit, then in
the next clean, if the incremental cleaning mode is enabled, the clean planner
doesn't find any information about the checked commits, and it will recheck all
the files a second time. This PR creates a clean commit contains the
`earliestCommitToRetain` regardless the deleted files list, in this case the
clean planner will check only the partitions that have been changed since the
`earliestCommitToRetain` in the last clean commit.
### Impact
A new clean commit will be added to the timeline even if there was not a
real clean operation. For the benefits, a big performance improvement (and cost
reduction of S3 listing) in cleaning operation for table where old partitions
are seldom changed.
### Risk level (write none, low medium or high below)
low:
The risk level is low because these changes affects only the clean plans
without files to delete, and I kept the checks on the empty commit files to
avoid Avro empty file exception, and I improved the method which clean this
empty files. If for some reason we have an empty Avro file, a brute force will
be performed to prepare the clean plan.
I will test these changes on our project within the week to make sure
everything is fine
### Documentation Update
_Describe any necessary documentation update if there is any new feature,
config, or user-facing change_
- _The config description must be updated if new configs are added or the
default value of the configs are changed_
- _Any new feature or user-facing change requires updating the Hudi website.
Please create a Jira ticket, attach the
ticket number here and follow the
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to
make
changes to the website._
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Change Logs and Impact were stated clearly
- [x] Adequate tests were added if applicable
- [ ] CI passed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]