ad1happy2go opened a new pull request, #18948: URL: https://github.com/apache/hudi/pull/18948
### Change Logs CDC before/after images inferred directly from base/log data files leaked the `_hoodie_*` meta columns, while images served from the supplemental CDC log already have them stripped at write time (`HoodieCDCLogger`). The clearest trigger is the `BASE_FILE_INSERT` case — an insert-only commit writes no CDC log file, so its change data is inferred from the base file. The result was an inconsistent, alternating-per-commit image schema where some commits' images carried meta columns and others did not. Fix: skip the `_hoodie_*` meta columns in `InternalRowToJsonStringConverter` so every inference case produces a schema-consistent, business-columns-only image. Tests: - `TestInternalRowToJsonStringConverter#stripsHoodieMetaColumns` — unit regression test asserting meta columns are dropped from the JSON image. - `TestCDCDataFrameSuite#testCDCImagesExcludeHoodieMetaFields` — functional test (MOR + inline compaction + upsert) parameterized over all `HoodieCDCSupplementalLoggingMode` values, asserting no before/after image contains a `_hoodie_*` column. ### Impact CDC incremental query before/after images are now consistent across all commits and supplemental-logging modes, containing only business columns. ### Risk level: low Behavior change is limited to the JSON content of CDC before/after images (removal of meta columns that should never have been present). ### Documentation Update None required. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
