hudi-agent commented on code in PR #18948:
URL: https://github.com/apache/hudi/pull/18948#discussion_r3382268101
##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/cdc/TestCDCDataFrameSuite.scala:
##########
@@ -959,4 +959,59 @@ class TestCDCDataFrameSuite extends HoodieCDCTestBase {
}
assertTrue(newRecordFound, "Should have found the new record with complex
data types in CDC")
}
+
+ /**
+ * Regression test for HUDI-14363: the CDC incremental query must produce
before/after images
+ * that contain only business columns, never the _hoodie_* meta columns.
Before the fix, images
+ * inferred directly from base/log data files (e.g. the BASE_FILE_INSERT
case, which is hit by an
+ * insert-only commit that writes no CDC log file) leaked the meta columns,
while images served
+ * from the supplemental CDC log did not - producing an inconsistent,
alternating-per-commit
+ * schema. This reproduces the reporter's scenario (MOR + inline compaction
every delta commit +
+ * upsert inserts) and asserts no image carries meta fields for any
supplemental logging mode.
+ */
+ @ParameterizedTest
+ @EnumSource(classOf[HoodieCDCSupplementalLoggingMode])
+ def testCDCImagesExcludeHoodieMetaFields(loggingMode:
HoodieCDCSupplementalLoggingMode): Unit = {
+ val options = commonOpts ++ Map(
+ DataSourceWriteOptions.TABLE_TYPE.key() ->
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL,
Review Comment:
🤖 nit: the adjacent options use typed constants
(`DataSourceWriteOptions.TABLE_TYPE.key()`,
`HoodieTableConfig.CDC_SUPPLEMENTAL_LOGGING_MODE.key`), so it'd be consistent
to use `HoodieCompactionConfig.INLINE_COMPACT.key()` and
`HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS.key()` here instead of
the bare string literals — insulates the test from key renames too.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]