kumudkumartirupati opened a new issue, #5976: URL: https://github.com/apache/hudi/issues/5976
- Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - YES **Describe the problem you faced** Getting an exception when the col_stats are enabled for tables which has decimal values. **A clear and concise description of the problem** Couldn't find many classes used in `HoodieMetadataPayload.java` from `org.apache.hudi.avro.model` package in the hudi spark bundle. **To Reproduce** ``` /opt/spark/bin/spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer local:/opt/spark/jars/hudi-utilities-slim-bundle_2.12-0.11.1.jar \ --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \ --payload-class org.apache.hudi.common.model.OverwriteWithLatestAvroPayload \ --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \ --table-type COPY_ON_WRITE \ --target-table default \ --props s3a://bucket/hudi-defaults.conf \ --config-folder s3a://bucket/configs \ --base-path-prefix s3a://bucket/dbs \ --source-ordering-field ts_ms \ --op UPSERT \ --sync-tool-classes org.apache.hudi.hive.HiveSyncTool,org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool \ --enable-sync ``` **Expected behavior** col_stats indexing should work when enabled on new tables for the first time. **Environment Description** * Hudi version : 0.11.1 * Spark version : 3.2.1 * Hadoop version : 3.3.1 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : yes **Stacktrace** ``` 22/06/25 16:09:47 ERROR HoodieMultiTableDeltaStreamer: error while running MultiTableDeltaStreamer for table: test org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 314.0 failed 4 times, most recent failure: Lost task 0.3 in stage 314.0 (TID 7224) (10.11.19.222 executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.avro.model.DecimalWrapper at org.apache.hudi.metadata.HoodieMetadataPayload.wrapStatisticValue(HoodieMetadataPayload.java:686) at org.apache.hudi.metadata.HoodieMetadataPayload.lambda$createColumnStatsRecords$13(HoodieMetadataPayload.java:595) at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source) at java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(Unknown Source) at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(Unknown Source) at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(Unknown Source) at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(Unknown Source) at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(Unknown Source) at java.base/java.util.Spliterators$1Adapter.hasNext(Unknown Source) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1498) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1408) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1472) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1295) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:384) at org.apache.spark.rdd.RDD.iterator(RDD.scala:335) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
