akshayakp97 opened a new issue, #9891:
URL: https://github.com/apache/hudi/issues/9891

   
   **Describe the problem you faced**
   
   Running into below error while using emr 6-10.1. Want to understand if this 
was fixed by newer Hudi versions. Found a similar issue for Flink - 
https://github.com/apache/hudi/issues/6540
   
   Hudi options - 
   
   ```
   if partition_field:
                   keygenerator = "org.apache.hudi.keygen.ComplexKeyGenerator"
                   extractor = 
"org.apache.hudi.hive.MultiPartKeysValueExtractor"
                   hive_style = "true"
                   partition_field_name = "pdate"
                   df = df.filter(df[partition_field].isNotNull())
                   df = df.withColumn(
                       partition_field_name, to_date(col(partition_field), 
"yyyy-MM-dd")
                   )
               else:
                   keygenerator = 
"org.apache.hudi.keygen.NonpartitionedKeyGenerator"
                   extractor = "org.apache.hudi.hive.NonPartitionedExtractor"
                   hive_style = "false"
                   partition_field_name = ""
   
               emr_master_ip = 
aws.get_emr_master_pvt_ip(region_name=region_name)
               hive_sync_jdbc = "jdbc:hive2://{}:10000".format(emr_master_ip)
               hudi_options = {
                   "hoodie.table.name": tgt_table,
                   "hoodie.datasource.write.recordkey.field": rec_key,
                   "hoodie.datasource.write.table.type": type,
                   "hoodie.datasource.write.keygenerator.class": keygenerator,
                   "hoodie.datasource.write.table.name": tgt_table,
                   "hoodie.datasource.write.operation": operation,
                   "hoodie.datasource.write.precombine.field": precombine,
                   "hoodie.datasource.write.payload.class": 
"org.apache.hudi.common.model.DefaultHoodieRecordPayload",
                   "hoodie.payload.ordering.field": precombine,
                   "hoodie.datasource.hive_sync.support_timestamp": "true",
                   "hoodie.bulkinsert.shuffle.parallelism": parallelism,
                   "hoodie.upsert.shuffle.parallelism": parallelism,
                   "hoodie.datasource.write.hive_style_partitioning": 
hive_style,
                   "hoodie.datasource.hive_sync.enable": "true",
                   "hoodie.datasource.hive_sync.partition_extractor_class": 
extractor,
                   "hoodie.datasource.hive_sync.auto_create_database": "true",
                   "hoodie.datasource.hive_sync.database": tgt_database,
                   "hoodie.datasource.hive_sync.table": tgt_table,
                   "hoodie.datasource.hive_sync.assume_date_partitioning": 
"false",
                   "hoodie.datasource.hive_sync.use_jdbc": "true",
                   "hoodie.datasource.hive_sync.jdbcurl": hive_sync_jdbc,
                   "hoodie.compact.inline": "true",
                   "hoodie.compact.inline.max.delta.commits": 
compact_inline_max_delta_commits,
                   "hoodie.cleaner.policy": "KEEP_LATEST_COMMITS",
                   "hoodie.cleaner.commits.retained": commit_retain_count,
                   "hoodie.keep.min.commits": min_commit_to_keep,
                   "hoodie.keep.max.commits": max_commit_to_keep,
               }
               if partition_field:
                   hudi_options[
                       "hoodie.datasource.hive_sync.partition_fields"
                   ] = partition_field_name
                   hudi_options["hoodie.metadata.enable"] = (
                       "true" if enable_metadata == "Y" else "false"
                   )
                   hudi_options[
                       "hoodie.datasource.write.partitionpath.field"
                   ] = partition_field_name
   ```
   
   **Environment Description**
   
   * Hudi version : 0.12.2
   
   * Spark version : 3.3.1
   
   * Hive version : 3.1.3
   
   * Storage (HDFS/S3/GCS..) : S3
   
   
   **Stacktrace**
   
   ```23/10/16 17:14:47 INFO S3NativeFileSystem: Opening 
's3://<redacted_s3path>/.hoodie/hoodie.properties' for reading
   23/10/16 17:14:47 INFO S3NativeFileSystem: Opening 
's3://<redacted_s3path>/pdate=2023-10-12/.16c8f46c-a073-403b-b830-553f8c44b735-0_20231013061901564.log.1_4-452-41829'
 for reading
   23/10/16 17:15:35 INFO S3NativeFileSystem: Opening 
's3://<redacted_s3path>/pdate=2023-10-12/16c8f46c-a073-403b-b830-553f8c44b735-0_0-457-44680_20231013061901564.parquet'
 for reading
   23/10/16 17:15:35 ERROR BoundedInMemoryExecutor: error consuming records
   org.apache.hudi.com.esotericsoftware.kryo.KryoException: 
java.lang.NullPointerException
   Serialization trace:
   props (org.apache.avro.Schema$FixedSchema)
   schema (org.apache.avro.generic.GenericData$Fixed)
   orderingVal (org.apache.hudi.common.model.DefaultHoodieRecordPayload)
   data (org.apache.hudi.common.model.HoodieAvroRecord)
       at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) 
~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) 
~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731) 
~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.common.util.SerializationUtils$KryoSerializerInstance.deserialize(SerializationUtils.java:100)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.common.util.SerializationUtils.deserialize(SerializationUtils.java:74)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:210)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:203)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:199)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.common.util.collection.BitCaskDiskMap.get(BitCaskDiskMap.java:68)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:198)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.common.util.collection.ExternalSpillableMap.get(ExternalSpillableMap.java:55)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.io.HoodieMergeHandle.write(HoodieMergeHandle.java:340) 
~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.table.action.commit.BaseMergeHelper$UpdateHandler.consumeOneRecord(BaseMergeHelper.java:90)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.table.action.commit.BaseMergeHelper$UpdateHandler.consumeOneRecord(BaseMergeHelper.java:80)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:135)
 ~[hudi-spark3-bundle_2.12-0.12.2-amzn-0.jar:0.12.2-amzn-0]
       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_382]
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_382]
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_382]
       at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
   Caused by: java.lang.NullPointerException```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to