kbuci opened a new issue, #18050:
URL: https://github.com/apache/hudi/issues/18050

   ### Task Description
   
   **What needs to be done:**
   We want to clarify what should be expected behavior if multiple jobs are 
concurrently attempting to schedule/execute a rollback plan for the same 
(failed write) instant. At a high level, we are considering two possible 
"design philosophies"
   (A) For a given inlfight instant, at most one rollback plan can be 
scheduled, with any concurrent job always re-using the same rollback plan.. 
Also, only one job can attempt to execute a rollback plan at any given time - 
any other job attempting to execute the same rollback plan should fail with a 
(transient) exception, until the original job completes/fails.
   OR
   (B) It is legal to have multiple rollback plans against the same instant. In 
addition, if multiple jobs attempt to execute the same rollback plan (delete 
instant/data files, create rollback instant files, write to MDT, write to MOR 
data log files) they should not fail nor corrupt the dataset. In other words, 
if we ever see such cases causing transient failures/corruption in HUDI, it 
would be considered as a "bug"
   
   **Why this task is needed:**
   In a multiwriter setup, concurrent writers may invoke clean and rollback of 
failed writes at the same time. 
   In our org's internal HUDI 0.14 build with GCS, we have seen transient 
failures in such scenarios:
   - If two jobs attempt to create a rollback.inflight/rollback at the same 
time, the latter will fail with a DFS error
   - If two jobs attempt to delete an instant files at the same time, the 
latter will fail with a DFS error
   
   **Proposed approach**
   In our org we ascribe by (B), since
   - In theory, data file deletions are idempotent, so it makes sense to 
"tolerate" concurrent jobs deleting the same files without throwing an error
   - MDT (And MOR tables) explicitly supports having the same file be "deleted" 
multiple times (via `spurious deleted`) as legal behavior in HUDI. So even if 
we saw a transient failure/corruption due to two concurrent rollbacks (or two 
concurrent executions of the same rollback plan) marking the same file as 
deleted over an over again, we would consider that a "bug"
   
   As a result, we personally consider those above issues as "bugs" rather than 
application/usage error. And we have partially resolved them by making fixes to 
how DFS APIs are used in HUDI:
   - When deleting an instant file, if the DFS API deletion call returned 
"false", we don't immediatelly return false. But instead we then do an `exists` 
call to check if the file is still there. If not, then we return "true" to 
indicate the file was actually deleted. This is since we noticed that in GCS 
that 2+ calls to delete a file at the same time may cause some calls to return 
`false` even if one of the calls actually deleted the file. Although this is 
not ideal behavior from DFS, I believe this is technically legal behavior on 
the DFS side (since returning `false` doesn't necessarily mean that the DFS's  
consistency model was broken, but that maybe there was a transient IO failure).
   
   But before we upstream any such changes, we wanted to confirm with the 
community that we are all aligned on (B) - that that is indeed an expectation 
HUDI must satisfy. And that we have not been making incorrect assumptions on 
HUDI rollbacks are supposed to work. Since otherwise the transient failures we 
discussed might not even be considered as HUDI bugs in the first place.
   
   
   
   ### Task Type
   
   Code improvement/refactoring
   
   ### Related Issues
   
   **Parent feature issue:** (if applicable )
   **Related issues:**
   NOTE: Use `Relationships` button to add parent/blocking issues after issue 
is created.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to