leesf commented on code in PR #11473:
URL: https://github.com/apache/hudi/pull/11473#discussion_r1684081505


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala:
##########
@@ -110,7 +110,9 @@ class 
HoodieFileGroupReaderBasedParquetFileFormat(tableState: HoodieTableState,
                                               hadoopConf: Configuration): 
PartitionedFile => Iterator[InternalRow] = {
     //dataSchema is not always right due to spark bugs
     val partitionColumns = partitionSchema.fieldNames
-    val dataSchema = 
StructType(tableSchema.structTypeSchema.fields.filterNot(f => 
partitionColumns.contains(f.name)))
+    val preCombineField = 
options.getOrElse(HoodieTableConfig.PRECOMBINE_FIELD.key, "")
+    val dataSchema = 
StructType(tableSchema.structTypeSchema.fields.filterNot(f => 
partitionColumns.contains(f.name)

Review Comment:
   > I still can't reproduce it: built hudi with spark3.5 profile, ran exactly 
your code - no exceptions, it passes successfully. Are you sure your master is 
up to date?
   
   @wombatu-kun yes, the code is commit is up to 20240716 with 
commit(c67cb42846bf0370627f9fb28ab4da25a7dcd403). Did you run in local or in 
cluster? I found it would also success in local but fails in cluster. When i 
debug in local, i found the value `val preCombineField = 
options.getOrElse(HoodieTableConfig.PRECOMBINE_FIELD.key, "")` is empty since 
options do not contains the key.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to