nsivabalan opened a new pull request #1912:
URL: https://github.com/apache/hudi/pull/1912


   ## What is the purpose of the pull request
   
   Introducing a TimedWaitOnAppearConsistencyGuard for eventual consistent 
stores. This will sleep for configured period of time only on APPEAR. It is a 
no-op for DISAPPEAR. This is specifically for eventually consistent stores like 
S3A filesystem and here is the rational.
   This guard is used when deleting data files corresponding to marker files 
that needs to be deleted.
   There are two tricky cases that needs to be considered. Case 1 : A data file 
creation is eventually consistent and hence
    when issuing deletes, the file may not be found. Case 2: a data file was 
never created in the first place as the process crashed.
    In S3A, GET and LIST are eventually consistent, and delete() implementation 
internally does a LIST/EXISTS.
    Prior to this patch, hudi was leveraging FailSafeConsistencyGuard which was 
doing the following to delete data files.
    Step1: wait for all files to appear with linear backoff timer w/ a max 
timer.
    Step2: issue deletes
    Step3: wait for all files to disappear.
    Step1 and Step2 is handled by {@link FailSafeConsistencyGuard}.
   
    We are simplifying these steps with TimedWaitOnAppearConsistencyGaurd as 
below.
    Step1: Sleep for a configured threshold.
    Step2: issue deletes.
   
    With this, if any files that was created, should be available within 
configured threshold(eventual consistency).
    Delete() will return false if FileNotFound. So, both cases are taken care 
of this {@link ConsistencyGuard}.
   Step3 is not required since if FileIsNotFound, delete() would have returned 
false and hence we ignore the return values. But if file exists and if file 
could not be deleted, some exception would be thrown. 
   
   ## Brief change log
   
     - *Added TimedWaitOnAppearConsistencyGuard to delete a bunch of files in 
eventually consistent cloud stores*
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
     - *Added tests to TestConsistencyGuard to verify the change.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to