codope opened a new pull request, #18988:
URL: https://github.com/apache/hudi/pull/18988
### Describe the issue this Pull Request addresses
Today the write commit callback (`HoodieWriteCommitCallback`) has two
limitations that make it awkward for consumers that want to react to what a
commit actually changed on storage:
1. The callback message doesn't say which files a commit replaced.
`HoodieWriteCommitCallbackMessage` carries the write stats, but a consumer that
wants to correlate each newly written base file with the existing base file
(and bootstrap source) it superseded has to rebuild a `FileSystemView` itself
duplicating I/O the write client already paid for.
2. The callback only fires for data commits. Compaction and clustering
completions never invoke the callback, so consumers get no signal for
table-service commits.
This PR addresses both, backward-compatibly (no breaking change to the
public API).
### Summary and Changelog
What users gain: callback implementations now receive, per updated file
group, the previous base file path (and bootstrap source path, if any)
pre-resolved by the write client without rebuilding a file-system view. The
callback also fires on compaction and clustering completions, not just data
commits.
Changelog (all changes under `hudi-client/hudi-client-common/`):
- `HoodieWriteCommitCallbackMessage`: added two optional fields:
- `prevFilePaths`: `Map<fileId, PrevFilePaths>`, where `PrevFilePaths`
holds `prevBaseFilePath` and `bootstrapBaseFilePath`.
- `extraContext`: `Map<String,String>` for producer-attached context.
Both default to empty maps (never null). The existing ctors are preserved; a
new all-args ctor is generated via the existing lombok `@AllArgsConstructor`.
- `BaseHoodieClient`: lifted the `commitCallback` field up from
`BaseHoodieWriteClient`, and added two shared methods:
- `fireCommitCallback(commitTime, commitActionType, stats,
BaseFileOnlyView, extraMetadata)` which lazily constructs the callback from
`hoodie.write.commit.callback.class` and invokes it.
- `resolvePrevFilePaths(stats, BaseFileOnlyView)` for each update stat,
looks up the previous base file via the cached view (`getBaseFileOn`),
capturing path + bootstrap path.
- `BaseHoodieWriteClient`: removed the inline callback block from
`commitStats`; the callback now fires from `postCommit`. `postCommit` takes the
resolved `commitActionType` so the message reports the actual action (e.g.
`replacecommit` for `insert_overwrite`) rather than the table's base action
type.
- `BaseHoodieTableServiceClient`: fires the callback after successful
compaction (commit action) and clustering (`replacecommit` action).
- `TestBaseHoodieClient` (new): Unit tests covering `resolvePrevFilePaths`
(inserts skipped, update resolution, bootstrap capture, missing-file skip,
best-effort on view failure, null inputs) and the message default/retention
contract.
No code was copied from third-party sources.
### Impact
- Public API: `HoodieWriteCommitCallbackMessage` is
`@PublicAPIClass(EVOLVING)`. The change is additive and backward compatible
i.e. existing ctors and getters are unchanged, and new fields default to empty
maps. Existing callback implementations compile and run unchanged.
- Behavior: the callback now also fires for (a) the executor auto-commit
path and (b) compaction/clustering completions, which previously did not fire.
Consumers that assumed the callback fired only for explicit data commits will
now see additional invocations.
- Performance: prev-file resolution reuses the already cached fs view. No
additional I/O beyond what the writer already performed. Resolution and
callback invocation are best-effort and never fail the write.
### Risk Level
low
Changes are confined to one module (hudi-client-common) and the callback
path. All callback/resolution failures are caught and logged, so a misbehaving
callback or a stale/remote view cannot fail a commit. Verified with the
apache/hudi default build profile. Also:
- Added a UT: `TestBaseHoodieClient`
- Repo-wide check confirms all existing `HoodieWriteCommitCallbackMessage`
ctor/getter call sites remain compatible.
### Documentation Update
none
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]