nsivabalan opened a new pull request, #18828:
URL: https://github.com/apache/hudi/pull/18828

   ### Change Logs
   
   Fix the per-task write token applied to rollback log files written by 
`RollbackHelperV1` (MOR table version 6 code path):
   
   - 
`hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.java`
     - When the pre-computed log version map indicates an existing log file for 
the file group, **keep** the per-task write token from 
`CommonClientUtils.generateWriteToken(taskContextSupplier)` and explicitly bump 
the writer's log version to `latest + 1` (for `doDelete=true`). Previously the 
code overrode the per-task token with the existing log's token (often 
`UNKNOWN_WRITE_TOKEN` = `1-0-1`), so the new rollback log inherited that token 
and retried rollbacks collided on file name.
     - For `doDelete=false` (stats-only collection, no append), leave the 
version unset so `WriterBuilder.build()` rediscovers the existing path — the 
downstream `storage.getPathInfo` lookup requires the file to exist on disk.
     - Switch the "no log file present" sentinel in `preComputeLogVersions` 
from `(LOGFILE_BASE_VERSION, UNKNOWN_WRITE_TOKEN)` to `(LOGFILE_BASE_VERSION, 
null)`. The previous sentinel was indistinguishable from a real log file whose 
token happened to equal `UNKNOWN_WRITE_TOKEN`.
   
   ### Impact
   
   - Repeated rollback attempts no longer create colliding log file names on 
MOR (v6) tables.
   - Metadata table no longer sees conflicting write tokens for rollback log 
files.
   - File-slice ordering remains consistent across retries.
   
   ### Risk level
   
   low — change is scoped to `RollbackHelperV1` (table version 6 path only); 
table version 8+ uses `RollbackHelper` and is unaffected.
   
   ### Documentation Update
   
   None.
   
   ### Tests
   
   - New 
`TestMergeOnReadRollbackActionExecutor#testRollbackWriteTokenGeneration` 
(parameterized on `enableMetadataTable`) forces table version 6, runs a 
rollback, asserts rollback log file write tokens match `\d+-\d+-\d+` and are 
not `1-0-1`, then simulates a rollback retry (by backing up + restoring the 
inflight commit's timeline files and marker directory) and confirms the second 
rollback produces an additional distinct log file per file group with no 
collision against the first rollback's output.
   - Existing `TestRollbackHelper` V1 cases updated to expect rollback log 
paths using the per-task `0-0-0` token (from `LocalTaskContextSupplier`) 
instead of the legacy `1-0-1` token; `doDelete=false` case unchanged.
   - `TestMarkerBasedRollbackStrategy` updated to extract the rollback log path 
from rollback stats rather than constructing it with a hardcoded write token.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [ ] CI passed
   
   Fixes #18827
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to