yaojiejia opened a new pull request, #18458: URL: https://github.com/apache/hudi/pull/18458
### Describe the issue this Pull Request addresses Closes #15942 When HoodieStreamer is configured to ingest JSON records, a single invalid record (e.g., plain text instead of valid JSON) causes the entire pipeline to crash with a `HoodieJsonToAvroConversionException`. The only existing workaround is configuring the full error table infrastructure, which is heavyweight for users who simply want to skip bad records and continue processing. ### Summary and Changelog Adds a new config `hoodie.streamer.source.json.skip.invalid.records` (default `false`) that allows HoodieStreamer to skip invalid JSON records instead of crashing. When enabled, bad records are logged at WARN level and dropped. This reuses the existing safe conversion methods (`fromJsonWithError`, `fromJsonToRowWithError`) that were previously only available when the error table was configured. - Added `SKIP_INVALID_JSON_RECORDS` config property to `HoodieStreamerConfig` - Modified `SourceFormatAdapter.transformJsonToGenericRdd()` to skip bad records during Avro conversion when config is enabled - Modified `SourceFormatAdapter.transformJsonToRowRdd()` to skip bad records during Row conversion when config is enabled - Modified `SourceFormatAdapter.fetchNewDataInRowFormat()` to use Spark's PERMISSIVE mode to skip corrupt records in the Spark JSON read path when config is enabled - Added 3 tests: skip in Avro format, skip in Row format, and default crash behavior preserved ### Impact New user facing config: `hoodie.streamer.source.json.skip.invalid.records` - Default: `false` (no behavior change for existing users) - When `true`: invalid JSON records are skipped with WARN-level logging instead of crashing the pipeline No breaking changes. No public API changes. No performance impact for users who do not enable the config. ### Risk Level Low. The feature is opt in and default behavior is unchanged. ### Documentation Update None, will open another PR to update the documentation if the current PR is merged ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
