nsivabalan opened a new pull request, #18828:
URL: https://github.com/apache/hudi/pull/18828
### Change Logs
Fix the per-task write token applied to rollback log files written by
`RollbackHelperV1` (MOR table version 6 code path):
-
`hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackHelperV1.java`
- When the pre-computed log version map indicates an existing log file for
the file group, **keep** the per-task write token from
`CommonClientUtils.generateWriteToken(taskContextSupplier)` and explicitly bump
the writer's log version to `latest + 1` (for `doDelete=true`). Previously the
code overrode the per-task token with the existing log's token (often
`UNKNOWN_WRITE_TOKEN` = `1-0-1`), so the new rollback log inherited that token
and retried rollbacks collided on file name.
- For `doDelete=false` (stats-only collection, no append), leave the
version unset so `WriterBuilder.build()` rediscovers the existing path — the
downstream `storage.getPathInfo` lookup requires the file to exist on disk.
- Switch the "no log file present" sentinel in `preComputeLogVersions`
from `(LOGFILE_BASE_VERSION, UNKNOWN_WRITE_TOKEN)` to `(LOGFILE_BASE_VERSION,
null)`. The previous sentinel was indistinguishable from a real log file whose
token happened to equal `UNKNOWN_WRITE_TOKEN`.
### Impact
- Repeated rollback attempts no longer create colliding log file names on
MOR (v6) tables.
- Metadata table no longer sees conflicting write tokens for rollback log
files.
- File-slice ordering remains consistent across retries.
### Risk level
low — change is scoped to `RollbackHelperV1` (table version 6 path only);
table version 8+ uses `RollbackHelper` and is unaffected.
### Documentation Update
None.
### Tests
- New
`TestMergeOnReadRollbackActionExecutor#testRollbackWriteTokenGeneration`
(parameterized on `enableMetadataTable`) forces table version 6, runs a
rollback, asserts rollback log file write tokens match `\d+-\d+-\d+` and are
not `1-0-1`, then simulates a rollback retry (by backing up + restoring the
inflight commit's timeline files and marker directory) and confirms the second
rollback produces an additional distinct log file per file group with no
collision against the first rollback's output.
- Existing `TestRollbackHelper` V1 cases updated to expect rollback log
paths using the per-task `0-0-0` token (from `LocalTaskContextSupplier`)
instead of the legacy `1-0-1` token; `doDelete=false` case unchanged.
- `TestMarkerBasedRollbackStrategy` updated to extract the rollback log path
from rollback stats rather than constructing it with a hardcoded write token.
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Change Logs and Impact were stated clearly
- [x] Adequate tests were added if applicable
- [ ] CI passed
Fixes #18827
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]