leesf commented on code in PR #11473:
URL: https://github.com/apache/hudi/pull/11473#discussion_r1684081505
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala:
##########
@@ -110,7 +110,9 @@ class
HoodieFileGroupReaderBasedParquetFileFormat(tableState: HoodieTableState,
hadoopConf: Configuration):
PartitionedFile => Iterator[InternalRow] = {
//dataSchema is not always right due to spark bugs
val partitionColumns = partitionSchema.fieldNames
- val dataSchema =
StructType(tableSchema.structTypeSchema.fields.filterNot(f =>
partitionColumns.contains(f.name)))
+ val preCombineField =
options.getOrElse(HoodieTableConfig.PRECOMBINE_FIELD.key, "")
+ val dataSchema =
StructType(tableSchema.structTypeSchema.fields.filterNot(f =>
partitionColumns.contains(f.name)
Review Comment:
> I still can't reproduce it: built hudi with spark3.5 profile, ran exactly
your code - no exceptions, it passes successfully. Are you sure your master is
up to date?
@wombatu-kun yes, the code is commit is up to 20240716 with
commit(c67cb42846bf0370627f9fb28ab4da25a7dcd403). Did you run in local or in
cluster? I found it would also success in local but fails in cluster. When i
debug in local, i found the value `val preCombineField =
options.getOrElse(HoodieTableConfig.PRECOMBINE_FIELD.key, "")` is empty since
options do not contains the key.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]