Wechar created HUDI-5986:
----------------------------

             Summary: empty preCombineKey should never be stored in 
hoodie.properties
                 Key: HUDI-5986
                 URL: https://issues.apache.org/jira/browse/HUDI-5986
             Project: Apache Hudi
          Issue Type: Bug
          Components: hudi-utilities
            Reporter: Wechar


*Overview:*
We found {{hoodie.properties}} will keep the empty preCombineKey if the table 
does not have preCombineKey. And the empty preCombineKey will cause the 
exception when insert data:
{code:bash}
Caused by: org.apache.hudi.exception.HoodieException: (Part -) field not found 
in record. Acceptable fields were :[id, name, price]
        at 
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:557)
        at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1134)
        at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1127)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
        at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193)
        at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
        at org.apache.spark.scheduler.Task.run(Task.scala:123)
        at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{code}

*Steps to Reproduce:*
{code:sql}
-- 1. create a table without preCombineKey
CREATE TABLE default.test_hudi_default_cm (
  uuid int,
  name string,
  price double
) USING hudi
options (
 primaryKey='uuid');

-- 2. config write operation to insert
set hoodie.datasource.write.operation=insert;
set hoodie.merge.allow.duplicate.on.inserts=true;

-- 3. insert data
insert into default.test_hudi_default_cm select 1, 'name1', 1.1;

-- 4. insert overwrite
insert overwrite table default.test_hudi_default_cm select 2, 'name3', 1.1;

-- 5. insert data will occur exception
insert into default.test_hudi_default_cm select 1, 'name3', 1.1;
{code}

*Root Cause:*
Hudi re-construct the table when *insert overwrite table* in sql but the 
configured operation   is not, then it stores the default empty preCombineKey 
in {{hoodie.properties}}.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to