a0x opened a new issue, #5792: URL: https://github.com/apache/hudi/issues/5792
**Describe the problem you faced** Update hudi table(using SparkSQL) failed when the column contains `null` value in other records, as the following image: <img width="1633" alt="image" src="https://user-images.githubusercontent.com/3829546/172540366-24153415-5948-4f19-a47a-ec47d634af66.png"> **To Reproduce** Steps to reproduce the behavior: 1. init table & data ```sql -- create table like this create table hudi.update_null_test_cow ( id bigint, name string, note string, dt string, ts bigint ) using hudi tblproperties( type = 'cow', primaryKey = 'id', preCombineField = 'ts' ) partitioned by (dt) location 's3://my-test-bucket/hudi/update_null_test_cow'; -- insert a record insert into hudi.update_null_test_cow partition(dt = '2022-06-08') select 1 as id, 'john doe' as name, '' as note, 1000 as ts; ``` 2. let's try to update the record, which will be updated successfully ```sql update hudi.update_null_test_cow set note = 'some note' where id = 1; ``` 3. add a new record, which contains a `null` value ```sql insert into hudi.update_null_test_cow partition(dt = '2022-06-08') select 2 as id, 'foobar' as name, null as note, 1000 as ts; ``` 4. update the record `id:1`, which will fail ```sql update hudi.update_null_test_cow set note = 'some other note' where id = 1; ``` **Expected behavior** The update query can proceed successfully. **Environment Description** * Hudi version : 0.10.1 * Spark version : 3.2.0 * Hive version : 3.1.2 * Hadoop version : 3.2.0 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** The whole environment is bundled in Amazon EMR 6.6.0. **Stacktrace** ``` Error happens in sql: update hudi.update_null_test_cow set note = 'some other note' where id = 1 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 171.0 failed 4 times, most recent failure: Lost task 0.3 in stage 171.0 (TID 8477) (executor 11): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:174) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384) at org.apache.spark.rdd.RDD.iterator(RDD.scala:335) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:133) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema { "type" : "record", "name" : "update_null_test_cow_record", "namespace" : "hoodie.update_null_test_cow", "fields" : [ { "name" : "_hoodie_commit_time", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_commit_seqno", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_record_key", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_partition_path", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_file_name", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "id", "type" : [ "null", "long" ], "default" : null }, { "name" : "name", "type" : [ "null", "string" ], "default" : null }, { "name" : "note", "type" : "string" }, { "name" : "ts", "type" : [ "null", "long" ], "default" : null }, { "name" : "dt", "type" : [ "null", "string" ], "default" : null } ] } at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:351) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:342) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:315) ... 28 more Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema { "type" : "record", "name" : "update_null_test_cow_record", "namespace" : "hoodie.update_null_test_cow", "fields" : [ { "name" : "_hoodie_commit_time", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_commit_seqno", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_record_key", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_partition_path", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_file_name", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "id", "type" : [ "null", "long" ], "default" : null }, { "name" : "name", "type" : [ "null", "string" ], "default" : null }, { "name" : "note", "type" : "string" }, { "name" : "ts", "type" : [ "null", "long" ], "default" : null }, { "name" : "dt", "type" : [ "null", "string" ], "default" : null } ] } at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147) at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100) ... 31 more Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema { "type" : "record", "name" : "update_null_test_cow_record", "namespace" : "hoodie.update_null_test_cow", "fields" : [ { "name" : "_hoodie_commit_time", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_commit_seqno", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_record_key", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_partition_path", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_file_name", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "id", "type" : [ "null", "long" ], "default" : null }, { "name" : "name", "type" : [ "null", "string" ], "default" : null }, { "name" : "note", "type" : "string" }, { "name" : "ts", "type" : [ "null", "long" ], "default" : null }, { "name" : "dt", "type" : [ "null", "string" ], "default" : null } ] } at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) ... 32 more Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema { "type" : "record", "name" : "update_null_test_cow_record", "namespace" : "hoodie.update_null_test_cow", "fields" : [ { "name" : "_hoodie_commit_time", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_commit_seqno", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_record_key", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_partition_path", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_file_name", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "id", "type" : [ "null", "long" ], "default" : null }, { "name" : "name", "type" : [ "null", "string" ], "default" : null }, { "name" : "note", "type" : "string" }, { "name" : "ts", "type" : [ "null", "long" ], "default" : null }, { "name" : "dt", "type" : [ "null", "string" ], "default" : null } ] } at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:356) at org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122) at org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112) at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more Caused by: java.lang.RuntimeException: Null-value for required field: note at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:200) at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:171) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138) at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310) at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:95) at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:351) ... 8 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2559) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2508) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2507) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2507) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1149) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1149) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1149) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2747) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2689) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2678) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:938) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2215) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2255) at org.apache.spark.rdd.RDD.$anonfun$take$1(RDD.scala:1449) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.take(RDD.scala:1422) at org.apache.spark.rdd.RDD.$anonfun$isEmpty$1(RDD.scala:1557) at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:414) at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1557) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:657) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:287) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:169) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:115) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:112) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:108) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:519) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:519) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:495) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:108) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:95) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:93) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:136) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:848) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:382) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:355) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:247) at org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.run(UpdateHoodieTableCommand.scala:79) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:115) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135) at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107) at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:112) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:108) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:519) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:519) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:495) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:108) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:95) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:93) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:221) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.zeppelin.spark.SparkSqlInterpreter.internalInterpret(SparkSqlInterpreter.java:106) at org.apache.zeppelin.interpreter.AbstractInterpreter.interpret(AbstractInterpreter.java:55) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:110) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:849) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:741) at org.apache.zeppelin.scheduler.Job.run(Job.java:172) at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132) at org.apache.zeppelin.scheduler.FIFOScheduler.lambda$runJobInScheduler$0(FIFOScheduler.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:174) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:386) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384) at org.apache.spark.rdd.RDD.iterator(RDD.scala:335) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:133) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1474) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) ... 3 more Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema { "type" : "record", "name" : "update_null_test_cow_record", "namespace" : "hoodie.update_null_test_cow", "fields" : [ { "name" : "_hoodie_commit_time", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_commit_seqno", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_record_key", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_partition_path", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_file_name", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "id", "type" : [ "null", "long" ], "default" : null }, { "name" : "name", "type" : [ "null", "string" ], "default" : null }, { "name" : "note", "type" : "string" }, { "name" : "ts", "type" : [ "null", "long" ], "default" : null }, { "name" : "dt", "type" : [ "null", "string" ], "default" : null } ] } at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:351) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:342) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:315) ... 28 more Caused by: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema { "type" : "record", "name" : "update_null_test_cow_record", "namespace" : "hoodie.update_null_test_cow", "fields" : [ { "name" : "_hoodie_commit_time", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_commit_seqno", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_record_key", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_partition_path", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_file_name", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "id", "type" : [ "null", "long" ], "default" : null }, { "name" : "name", "type" : [ "null", "string" ], "default" : null }, { "name" : "note", "type" : "string" }, { "name" : "ts", "type" : [ "null", "long" ], "default" : null }, { "name" : "dt", "type" : [ "null", "string" ], "default" : null } ] } at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147) at org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100) ... 31 more Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema { "type" : "record", "name" : "update_null_test_cow_record", "namespace" : "hoodie.update_null_test_cow", "fields" : [ { "name" : "_hoodie_commit_time", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_commit_seqno", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_record_key", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_partition_path", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_file_name", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "id", "type" : [ "null", "long" ], "default" : null }, { "name" : "name", "type" : [ "null", "string" ], "default" : null }, { "name" : "note", "type" : "string" }, { "name" : "ts", "type" : [ "null", "long" ], "default" : null }, { "name" : "dt", "type" : [ "null", "string" ], "default" : null } ] } at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141) ... 32 more Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to merge old record into new file for key id:2 from old file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-124-6252_20220608044435168.parquet to new file s3://my-test-bucket/hudi/update_null_test_cow/dt=2022-06-08/730df40f-7973-48eb-a494-b167030bfd37-0_0-171-8477_20220608044759733.parquet with writerSchema { "type" : "record", "name" : "update_null_test_cow_record", "namespace" : "hoodie.update_null_test_cow", "fields" : [ { "name" : "_hoodie_commit_time", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_commit_seqno", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_record_key", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_partition_path", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "_hoodie_file_name", "type" : [ "null", "string" ], "doc" : "", "default" : null }, { "name" : "id", "type" : [ "null", "long" ], "default" : null }, { "name" : "name", "type" : [ "null", "string" ], "default" : null }, { "name" : "note", "type" : "string" }, { "name" : "ts", "type" : [ "null", "long" ], "default" : null }, { "name" : "dt", "type" : [ "null", "string" ], "default" : null } ] } at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:356) at org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:122) at org.apache.hudi.table.action.commit.AbstractMergeHelper$UpdateHandler.consumeOneRecord(AbstractMergeHelper.java:112) at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37) at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more Caused by: java.lang.RuntimeException: Null-value for required field: note at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:200) at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:171) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138) at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310) at org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:95) at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:351) ... 8 more ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
