poocb opened a new issue, #9390: URL: https://github.com/apache/hudi/issues/9390
**Describe the problem you faced** I would like to disable precombine logic during upsert. Looking at Hudi Doc, there is an option "hoodie.combine.before.upsert" that should do the work, but it doesn't and throw error as stack trace section below. **To Reproduce** Steps to reproduce the behavior: 1. Set hoodie.datasource.write.precombine.field to MY_COLUMN. 2. Set hoodie.combine.before.upsert to "false" 3. First time insert with MY_COLUMN that has NULL value, success write to Hudi. 4. upsert for same record, write failed. **Expected behavior** With given option to set hoodie.combine.before.upsert, expecting the upsert should went through. For comparison, we also test another option "hoodie.combine.before.insert", by changing its default value from "false" to "true", and rerun the steps, it will failed at Step 3 this time, which is expected, means that "hoodie.combine.before.insert" is working. **Environment Description** * Hudi version : 0.12.1 * Spark version : 3.3.x * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** Studying the [source code](https://raw.githubusercontent.com/apache/hudi/release-0.12.1/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala) for the same version run, at line 288, the condition to set for variable "shouldCombine" does not taking care of COMBINE_BEFORE_UPSERT. For time being, we'll workaround this following the comment by bvaradar on [Aug 21](https://github.com/apache/hudi/issues/1960) to use the record key as our precombine field. **Stacktrace** ``` Caused by: org.apache.hudi.exception.HoodieException: The value of MY_COLUMN can not be null at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:532) at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$write$11(HoodieSparkSqlWriter.scala:297) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
