shangxinli opened a new pull request, #18765: URL: https://github.com/apache/hudi/pull/18765
### Describe the issue this Pull Request addresses Closes #18750. Migrates `HoodieStreamerWriteStatusValidator` (HSWSV) into the pre-commit validator framework (#18068, #18362, #18405). ### Summary and Changelog Deletes HSWSV and replaces it with explicit pre-commit orchestration in `StreamSync`. HSWSV's three concerns are extracted into named single-purpose helpers; the framework-wired equivalent is added as an opt-in validator. - **`SparkWriteErrorValidator`** (new) — `BasePreCommitValidator` for write errors. Opt-in. `failure.policy=FAIL` mirrors `commitOnErrors=false`; `WARN_LOG` mirrors `commitOnErrors=true`. - **`SuccessfulRecordCounter`** (new) — pure counting; supports error-table unification. - **`ErrorTableCommitter`** (new) — error-table commit; returns success/failure for caller-driven strategy handling. - **`WriteErrorReporter`** (new) — top-N errored-status logging. - **`StreamSync.writeToSinkAndDoMetaSync()`** — orchestrates explicitly: run validators → count → commit error table → apply write-error gate (preserves `commitOnErrors`) → `writeClient.commit()` without the `WriteStatusValidator` callback. - **HSWSV deleted** (~100 LOC). - **`HoodiePreCommitValidatorConfig.VALIDATOR_CLASS_NAMES`** doc references the new validator. ### Impact - **`WriteStatusValidator` interface preserved.** `DataSourceUtils.SparkDataSourceWriteStatusValidator` is another active caller in the Spark datasource path — the hook stays; only the HoodieStreamer registration is removed. - **No behavior change** for users who don't configure `hoodie.precommit.validators`. The inline error gate in `StreamSync` preserves HSWSV semantics. `commitOnErrors` continues to work. - **`ROLLBACK_COMMIT` / `LOG_ERROR`** strategies preserved. Because the orchestration now runs *before* `writeClient.commit()`, `ROLLBACK_COMMIT` no longer needs to roll back — the commit simply doesn't happen. ### Risk Level **medium** — touches the hot path of every HoodieStreamer commit. Semantics are equivalent to HSWSV by construction, but the call site moves from inside `writeClient.commit()` to before it. Verified: `test-compile` BUILD SUCCESS · `TestSparkKafkaOffsetValidator,TestSparkValidationContext,TestSparkStreamerValidatorUtils,TestSparkWriteErrorValidator,TestSuccessfulRecordCounter` 49/49 · `TestStreamSync,TestHoodieStreamerUtils` 51/51 · checkstyle 0 · RAT 0. Not run locally: full `TestHoodieDeltaStreamer` integration suite — relying on CI. ### Documentation Update `VALIDATOR_CLASS_NAMES` Javadoc updated to list `SparkWriteErrorValidator`. Each new helper has class-level Javadoc explaining its single responsibility. No website changes needed — `commitOnErrors` user-facing config is unchanged. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
