poocb opened a new issue, #9390:
URL: https://github.com/apache/hudi/issues/9390

   **Describe the problem you faced**
   
   I would like to disable precombine logic during upsert. Looking at Hudi Doc, 
there is an option "hoodie.combine.before.upsert" that should do the work, but 
it doesn't and throw error as stack trace section below.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Set hoodie.datasource.write.precombine.field to MY_COLUMN.
   2. Set hoodie.combine.before.upsert to "false"
   3. First time insert with MY_COLUMN that has NULL value, success write to 
Hudi.
   4. upsert for same record, write failed.
   
   **Expected behavior**
   
   With given option to set hoodie.combine.before.upsert, expecting the upsert 
should went through. 
   For comparison, we also test another option "hoodie.combine.before.insert", 
by changing its default value from "false" to "true", and rerun the steps, it 
will failed at Step 3 this time, which is expected, means that 
"hoodie.combine.before.insert" is working. 
   
   **Environment Description**
   
   * Hudi version : 0.12.1
   
   * Spark version : 3.3.x
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Studying the [source 
code](https://raw.githubusercontent.com/apache/hudi/release-0.12.1/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala)
 for the same version run, at line 288, the condition to set for variable 
"shouldCombine" does not taking care of COMBINE_BEFORE_UPSERT.
   
   For time being, we'll workaround this following the comment by bvaradar on 
[Aug 21](https://github.com/apache/hudi/issues/1960) to use the record key as 
our precombine field.
   
   **Stacktrace**
   
   ```
   Caused by: org.apache.hudi.exception.HoodieException: The value of MY_COLUMN 
can not be null
        at 
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:532)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$write$11(HoodieSparkSqlWriter.scala:297)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to