Wechar created HUDI-5986:
----------------------------
Summary: empty preCombineKey should never be stored in
hoodie.properties
Key: HUDI-5986
URL: https://issues.apache.org/jira/browse/HUDI-5986
Project: Apache Hudi
Issue Type: Bug
Components: hudi-utilities
Reporter: Wechar
*Overview:*
We found {{hoodie.properties}} will keep the empty preCombineKey if the table
does not have preCombineKey. And the empty preCombineKey will cause the
exception when insert data:
{code:bash}
Caused by: org.apache.hudi.exception.HoodieException: (Part -) field not found
in record. Acceptable fields were :[id, name, price]
at
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:557)
at
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1134)
at
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1127)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193)
at
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
*Steps to Reproduce:*
{code:sql}
-- 1. create a table without preCombineKey
CREATE TABLE default.test_hudi_default_cm (
uuid int,
name string,
price double
) USING hudi
options (
primaryKey='uuid');
-- 2. config write operation to insert
set hoodie.datasource.write.operation=insert;
set hoodie.merge.allow.duplicate.on.inserts=true;
-- 3. insert data
insert into default.test_hudi_default_cm select 1, 'name1', 1.1;
-- 4. insert overwrite
insert overwrite table default.test_hudi_default_cm select 2, 'name3', 1.1;
-- 5. insert data will occur exception
insert into default.test_hudi_default_cm select 1, 'name3', 1.1;
{code}
*Root Cause:*
Hudi re-construct the table when *insert overwrite table* in sql but the
configured operation is not, then it stores the default empty preCombineKey
in {{hoodie.properties}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)