kbuci opened a new issue, #18050: URL: https://github.com/apache/hudi/issues/18050
### Task Description **What needs to be done:** We want to clarify what should be expected behavior if multiple jobs are concurrently attempting to schedule/execute a rollback plan for the same (failed write) instant. At a high level, we are considering two possible "design philosophies" (A) For a given inlfight instant, at most one rollback plan can be scheduled, with any concurrent job always re-using the same rollback plan.. Also, only one job can attempt to execute a rollback plan at any given time - any other job attempting to execute the same rollback plan should fail with a (transient) exception, until the original job completes/fails. OR (B) It is legal to have multiple rollback plans against the same instant. In addition, if multiple jobs attempt to execute the same rollback plan (delete instant/data files, create rollback instant files, write to MDT, write to MOR data log files) they should not fail nor corrupt the dataset. In other words, if we ever see such cases causing transient failures/corruption in HUDI, it would be considered as a "bug" **Why this task is needed:** In a multiwriter setup, concurrent writers may invoke clean and rollback of failed writes at the same time. In our org's internal HUDI 0.14 build with GCS, we have seen transient failures in such scenarios: - If two jobs attempt to create a rollback.inflight/rollback at the same time, the latter will fail with a DFS error - If two jobs attempt to delete an instant files at the same time, the latter will fail with a DFS error **Proposed approach** In our org we ascribe by (B), since - In theory, data file deletions are idempotent, so it makes sense to "tolerate" concurrent jobs deleting the same files without throwing an error - MDT (And MOR tables) explicitly supports having the same file be "deleted" multiple times (via `spurious deleted`) as legal behavior in HUDI. So even if we saw a transient failure/corruption due to two concurrent rollbacks (or two concurrent executions of the same rollback plan) marking the same file as deleted over an over again, we would consider that a "bug" As a result, we personally consider those above issues as "bugs" rather than application/usage error. And we have partially resolved them by making fixes to how DFS APIs are used in HUDI: - When deleting an instant file, if the DFS API deletion call returned "false", we don't immediatelly return false. But instead we then do an `exists` call to check if the file is still there. If not, then we return "true" to indicate the file was actually deleted. This is since we noticed that in GCS that 2+ calls to delete a file at the same time may cause some calls to return `false` even if one of the calls actually deleted the file. Although this is not ideal behavior from DFS, I believe this is technically legal behavior on the DFS side (since returning `false` doesn't necessarily mean that the DFS's consistency model was broken, but that maybe there was a transient IO failure). But before we upstream any such changes, we wanted to confirm with the community that we are all aligned on (B) - that that is indeed an expectation HUDI must satisfy. And that we have not been making incorrect assumptions on HUDI rollbacks are supposed to work. Since otherwise the transient failures we discussed might not even be considered as HUDI bugs in the first place. ### Task Type Code improvement/refactoring ### Related Issues **Parent feature issue:** (if applicable ) **Related issues:** NOTE: Use `Relationships` button to add parent/blocking issues after issue is created. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
