hudi-agent commented on code in PR #18948:
URL: https://github.com/apache/hudi/pull/18948#discussion_r3382268101


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/cdc/TestCDCDataFrameSuite.scala:
##########
@@ -959,4 +959,59 @@ class TestCDCDataFrameSuite extends HoodieCDCTestBase {
     }
     assertTrue(newRecordFound, "Should have found the new record with complex 
data types in CDC")
   }
+
+  /**
+   * Regression test for HUDI-14363: the CDC incremental query must produce 
before/after images
+   * that contain only business columns, never the _hoodie_* meta columns. 
Before the fix, images
+   * inferred directly from base/log data files (e.g. the BASE_FILE_INSERT 
case, which is hit by an
+   * insert-only commit that writes no CDC log file) leaked the meta columns, 
while images served
+   * from the supplemental CDC log did not - producing an inconsistent, 
alternating-per-commit
+   * schema. This reproduces the reporter's scenario (MOR + inline compaction 
every delta commit +
+   * upsert inserts) and asserts no image carries meta fields for any 
supplemental logging mode.
+   */
+  @ParameterizedTest
+  @EnumSource(classOf[HoodieCDCSupplementalLoggingMode])
+  def testCDCImagesExcludeHoodieMetaFields(loggingMode: 
HoodieCDCSupplementalLoggingMode): Unit = {
+    val options = commonOpts ++ Map(
+      DataSourceWriteOptions.TABLE_TYPE.key() -> 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL,

Review Comment:
   🤖 nit: the adjacent options use typed constants 
(`DataSourceWriteOptions.TABLE_TYPE.key()`, 
`HoodieTableConfig.CDC_SUPPLEMENTAL_LOGGING_MODE.key`), so it'd be consistent 
to use `HoodieCompactionConfig.INLINE_COMPACT.key()` and 
`HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS.key()` here instead of 
the bare string literals — insulates the test from key renames too.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to