cshuo commented on code in PR #18911:
URL: https://github.com/apache/hudi/pull/18911#discussion_r3354488264


##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileFormat.java:
##########
@@ -56,8 +56,8 @@ public enum HoodieFileFormat {
   LANCE(".lance");
 
   public static final String LANCE_SPARK_ONLY_ERROR_MSG =

Review Comment:
   Can we also rename the field name?



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/reader/function/HoodieCdcSplitReaderFunction.java:
##########
@@ -278,8 +284,24 @@ private ClosableIterator<HoodieRecord<RowData>> 
getFileSliceHoodieRecordIterator
     }
   }
 
-  /** Reads a parquet CDC base file returning required-schema records. */
+  /** Reads a CDC base file returning required-schema records. */
   private ClosableIterator<RowData> getBaseFileIterator(String path) throws 
IOException {
+    if (path.endsWith(HoodieFileFormat.LANCE.getFileExtension())) {

Review Comment:
   The Lance base-file reader setup is now duplicated between 
`MergeOnReadInputFormat` and `HoodieCdcSplitReaderFunction`. Could we extract a 
small shared helper for building the selected `DataType` / requested 
`HoodieSchema` and opening `HoodieRowDataLanceReader`? That would reduce drift 
if schema, predicate, or close/error handling needs to change later.



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java:
##########
@@ -270,6 +276,22 @@ private void mayShiftInputSplit(MergeOnReadInputSplit 
split) throws IOException
   }
 
   protected ClosableIterator<RowData> getBaseFileIterator(String path) throws 
IOException {
+    if (path.endsWith(HoodieFileFormat.LANCE.getFileExtension())) {

Review Comment:
   Could we add a CDC coverage case for Lance MOR here? This PR adds Lance 
base-file handling for `BASE_FILE_INSERT` in both the Source V2 CDC reader and 
the legacy CDC path, but the new tests only cover snapshot reads. A CDC-enabled 
test that exercises a Lance MOR base-file CDC inference case would make this 
path much safer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to