Hi Selvaraj,
Even though the incoming batch has non null values for the new column, existing 
data do not have this column. So, you need to make sure the avro schema has the 
new column to be nullable and be backwards compatible.
Balaji.V
    On Friday, August 21, 2020, 10:06:40 AM PDT, selvaraj periyasamy 
<selvaraj.periyasamy1...@gmail.com> wrote:  
 
 Hi,

with 0.5.0 version of Hudi, I am using COW table type, which is
partitioned by yyyymmdd format . We already have a table with Array<String>
type columns and data populated. And then we are now trying to add a new
column ("rule_profile_id_list") in dataframes and while trying to write ,
getting below exception the below error message.  I am making sure that
DataFrame that I pass is having non null value as it is a non-nullable
column as per schema definition in dataframe.  I don't use "--conf
spark.sql.hive.convertMetastoreParquet=false" because I am already setting
 below code snippet handled in my code.

sparkSession.sparkContext.hadoopConfiguration.setClass("mapreduce.input.pathFilter.class",
classOf[org.apache.hudi.hadoop.HoodieROTablePathFilter],
classOf[org.apache.hadoop.fs.PathFilter]);


Could someone help me to resolve this error?

20/08/21 08:38:30 WARN TaskSetManager: Lost task 8.0 in stage 151.0 (TID
31217, sl73caehdn0811.visa.com, executor 10):
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType
UPDATE for partition :8
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264)
at
org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
Null-value for required field: rule_profile_id_list
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257)
... 28 more
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
Null-value for required field: rule_profile_id_list
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200)
... 30 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.RuntimeException: Null-value for required field:
rule_profile_id_list
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140)
... 31 more
Caused by: java.lang.RuntimeException: Null-value for required field:
rule_profile_id_list
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:170)
at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293)
at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:101)
at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:288)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:432)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:422)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more

20/08/21 08:38:30 INFO TaskSetManager: Starting task 8.1 in stage 151.0
(TID 31238, sl73caehdn0615.visa.com, executor 100, partition 8,
PROCESS_LOCAL, 7661 bytes)
20/08/21 08:38:30 INFO TaskSetManager: Finished task 14.0 in stage 151.0
(TID 31223) in 1269 ms on sl73caehdn0615.visa.com (executor 100) (1/29)
20/08/21 08:38:30 INFO BlockManagerInfo: Added rdd_329_9 in memory on
sl73caehdn0709.visa.com:34428 (size: 379.0 B, free: 5.1 GB)
20/08/21 08:38:30 INFO TaskSetManager: Finished task 9.0 in stage 151.0
(TID 31218) in 1663 ms on sl73caehdn0709.visa.com (executor 62) (2/29)
20/08/21 08:38:30 INFO BlockManagerInfo: Added rdd_329_23 in memory on
sl73caehdn0716.visa.com:45986 (size: 372.0 B, free: 5.1 GB)
20/08/21 08:38:30 INFO TaskSetManager: Finished task 23.0 in stage 151.0
(TID 31232) in 1754 ms on sl73caehdn0716.visa.com (executor 99) (3/29)
20/08/21 08:38:31 INFO TaskSetManager: Lost task 8.1 in stage 151.0 (TID
31238) on sl73caehdn0615.visa.com, executor 100:
org.apache.hudi.exception.HoodieUpsertException (Error upserting bucketType
UPDATE for partition :8) [duplicate 1]
20/08/21 08:38:31 INFO TaskSetManager: Starting task 8.2 in stage 151.0
(TID 31239, sl73caehdn0711.visa.com, executor 81, partition 8,
PROCESS_LOCAL, 7661 bytes)
20/08/21 08:38:31 INFO BlockManagerInfo: Added broadcast_130_piece0 in
memory on sl73caehdn0711.visa.com:43376 (size: 89.1 KB, free: 5.1 GB)
20/08/21 08:38:31 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 32 to 10.160.39.149:43212
20/08/21 08:38:31 WARN TaskSetManager: Lost task 6.0 in stage 151.0 (TID
31215, sl73caehdn0423.visa.com, executor 48):
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType
UPDATE for partition :6
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:264)
at
org.apache.hudi.HoodieWriteClient.lambda$upsertRecordsInternal$507693af$1(HoodieWriteClient.java:428)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:847)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1109)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
Null-value for required field: rule_profile_id_list
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:178)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:257)
... 28 more
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
Null-value for required field: rule_profile_id_list
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:142)
at
org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:200)
... 30 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.RuntimeException: Null-value for required field:
rule_profile_id_list
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:140)
... 31 more
Caused by: java.lang.RuntimeException: Null-value for required field:
rule_profile_id_list
at
org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:170)
at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:142)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:293)
at
org.apache.hudi.io.storage.HoodieParquetWriter.writeAvro(HoodieParquetWriter.java:101)
at org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:288)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:432)
at
org.apache.hudi.table.HoodieCopyOnWriteTable$UpdateHandler.consumeOneRecord(HoodieCopyOnWriteTable.java:422)
at
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
  

Reply via email to