nsivabalan opened a new pull request #1912:
URL: https://github.com/apache/hudi/pull/1912
## What is the purpose of the pull request
Introducing a TimedWaitOnAppearConsistencyGuard for eventual consistent
stores. This will sleep for configured period of time only on APPEAR. It is a
no-op for DISAPPEAR. This is specifically for eventually consistent stores like
S3A filesystem and here is the rational.
This guard is used when deleting data files corresponding to marker files
that needs to be deleted.
There are two tricky cases that needs to be considered. Case 1 : A data file
creation is eventually consistent and hence
when issuing deletes, the file may not be found. Case 2: a data file was
never created in the first place as the process crashed.
In S3A, GET and LIST are eventually consistent, and delete() implementation
internally does a LIST/EXISTS.
Prior to this patch, hudi was leveraging FailSafeConsistencyGuard which was
doing the following to delete data files.
Step1: wait for all files to appear with linear backoff timer w/ a max
timer.
Step2: issue deletes
Step3: wait for all files to disappear.
Step1 and Step2 is handled by {@link FailSafeConsistencyGuard}.
We are simplifying these steps with TimedWaitOnAppearConsistencyGaurd as
below.
Step1: Sleep for a configured threshold.
Step2: issue deletes.
With this, if any files that was created, should be available within
configured threshold(eventual consistency).
Delete() will return false if FileNotFound. So, both cases are taken care
of this {@link ConsistencyGuard}.
Step3 is not required since if FileIsNotFound, delete() would have returned
false and hence we ignore the return values. But if file exists and if file
could not be deleted, some exception would be thrown.
## Brief change log
- *Added TimedWaitOnAppearConsistencyGuard to delete a bunch of files in
eventually consistent cloud stores*
## Verify this pull request
This change added tests and can be verified as follows:
- *Added tests to TestConsistencyGuard to verify the change.*
## Committer checklist
- [ ] Has a corresponding JIRA in PR title & commit
- [ ] Commit message is descriptive of the change
- [ ] CI is green
- [ ] Necessary doc changes done or have another open PR
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]