leesf commented on code in PR #11473:
URL: https://github.com/apache/hudi/pull/11473#discussion_r1681098941
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala:
##########
@@ -110,7 +110,9 @@ class
HoodieFileGroupReaderBasedParquetFileFormat(tableState: HoodieTableState,
hadoopConf: Configuration):
PartitionedFile => Iterator[InternalRow] = {
//dataSchema is not always right due to spark bugs
val partitionColumns = partitionSchema.fieldNames
- val dataSchema =
StructType(tableSchema.structTypeSchema.fields.filterNot(f =>
partitionColumns.contains(f.name)))
+ val preCombineField =
options.getOrElse(HoodieTableConfig.PRECOMBINE_FIELD.key, "")
+ val dataSchema =
StructType(tableSchema.structTypeSchema.fields.filterNot(f =>
partitionColumns.contains(f.name)
Review Comment:
@wombatu-kun it will throw exception as well on master with this patch. With
following code
```
val spark = SparkSession.builder.appName("Spark Hudi Example")
.config("spark.serializer",
"org.apache.spark.serializer.KryoSerializer")
.config("spark.sql.extensions",
"org.apache.spark.sql.hudi.HoodieSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog",
"org.apache.spark.sql.hudi.catalog.HoodieCatalog")
.getOrCreate
spark.sql(
"""
|Create Table if not exists mor_test (id bigint, name string, age
bigint) using hudi partitioned by (age) TBLPROPERTIES (
|primaryKey = 'id',
|preCombineField = 'age',
|'type' = 'mor'
|) LOCATION 'xxx:///bucket/mor_test/'
|""".stripMargin)
spark.sql("insert into mor_test values (1, 'adb', 10), (2, 'spark', 20)")
spark.sql("update mor_test set name = 'adbadb' where id = 1")
spark.sql("delete from mor_test where id = 2")
spark.sql("select * from mor_test").show(false)
```
`update mor_test set name = 'adbadb' where id = 1` will throw exception with
`Field: age does not exist in the table schema`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]