rahultoall opened a new issue #2302: URL: https://github.com/apache/hudi/issues/2302
hi, I am facing issue when i try to sync my hudi table to hive using the spark DataSource Api. Spark version - 2.4.7 spark-avro - spark-avro_2.11-2.4.7 hudi-spark - hudi-spark-bundle_2.11-0.6.0 i have set following properties in spark conf as well spark.serializer = org.apache.spark.serializer.KryoSerializer spark.sql.hive.convertMetastoreParquet = false Also i have added the **hudi-hadoop-mr-bundle-0.6.0.jar** in Hive's aux-path following is the snippet i used to write a dataframe to hudi syncing to hive df.write.format("hudi") .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL) .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "trip_id") .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "createdDate") .option(HoodieWriteConfig.TABLE_NAME, "trips_hive") .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY,trips_hive) .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY,"true") .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY,"jdbc:hive2://<ip>:10000") .option(DataSourceWriteOptions.HIVE_USER_OPT_KEY,"<hive_username>") .option(DataSourceWriteOptions.HIVE_PASS_OPT_KEY,"<hive_password>") .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY,"org.apache.hudi.hive.NonPartitionedExtractor") .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,"org.apache.hudi.keygen.NonpartitionedKeyGenerator") .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL) .option("hoodie.upsert.shuffle.parallelism", "4") .mode(SaveMode.Overwrite) .save(<hdfs path>) The dataframe that i tried to insert is +-------+----------+-----------+------------------+ |trip_id|route_type|destination| createdDate| +-------+----------+-----------+------------------+ | 1001| B| New York|2020-12-7 12:30:33| | 1002| C| New Jersey|2020-12-7 12:30:33| | 1003| D|Los Angeles|2020-12-7 12:30:33| | 1004| E| Las Vegas|2020-12-7 12:30:33| | 1005| F| Tucson|2020-12-7 12:30:33| | 1004| E| Las Vegas|2020-12-7 12:30:38| +-------+----------+-----------+------------------+ i get the following exception when i tried to execute the above 753927 [Executor task launch worker for task 6] ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 3.0 (TID 6) java.lang.NoSuchMethodError: org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(Lorg/apache/avro/generic/GenericRecord;Ljava/lang/String;Z)Ljava/lang/Object; at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:140) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:139) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$10.next(Iterator.scala:394) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at scala.collection.AbstractIterator.to(Iterator.scala:1334) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at scala.collection.AbstractIterator.toArray(Iterator.scala:1334) at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409) at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 753951 [task-result-getter-0] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 3.0 (TID 6, localhost, executor driver): java.lang.NoSuchMethodError: org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(Lorg/apache/avro/generic/GenericRecord;Ljava/lang/String;Z)Ljava/lang/Object; at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:140) at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:139) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$10.next(Iterator.scala:394) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) at scala.collection.AbstractIterator.to(Iterator.scala:1334) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at scala.collection.AbstractIterator.toArray(Iterator.scala:1334) at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409) at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) When i exlored the jars i found both the jars **hudi-spark-bundle_2.11-0.6.0** and **hudi-hadoop-mr-bundle-0.6.0.jar** contains HoodieAvroUtils class. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org