rahultoall opened a new issue #2302:
URL: https://github.com/apache/hudi/issues/2302
hi,
I am facing issue when i try to sync my hudi table to hive using the spark
DataSource Api.
Spark version - 2.4.7
spark-avro - spark-avro_2.11-2.4.7
hudi-spark - hudi-spark-bundle_2.11-0.6.0
i have set following properties in spark conf as well
spark.serializer = org.apache.spark.serializer.KryoSerializer
spark.sql.hive.convertMetastoreParquet = false
Also i have added the **hudi-hadoop-mr-bundle-0.6.0.jar** in Hive's aux-path
following is the snippet i used to write a dataframe to hudi syncing to hive
df.write.format("hudi")
.option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY,
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "trip_id")
.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "createdDate")
.option(HoodieWriteConfig.TABLE_NAME, "trips_hive")
.option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY,trips_hive)
.option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY,"true")
.option(DataSourceWriteOptions.HIVE_URL_OPT_KEY,"jdbc:hive2://<ip>:10000")
.option(DataSourceWriteOptions.HIVE_USER_OPT_KEY,"<hive_username>")
.option(DataSourceWriteOptions.HIVE_PASS_OPT_KEY,"<hive_password>")
.option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY,"org.apache.hudi.hive.NonPartitionedExtractor")
.option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,"org.apache.hudi.keygen.NonpartitionedKeyGenerator")
.option(DataSourceWriteOptions.OPERATION_OPT_KEY,
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
.option("hoodie.upsert.shuffle.parallelism", "4")
.mode(SaveMode.Overwrite)
.save(<hdfs path>)
The dataframe that i tried to insert is
+-------+----------+-----------+------------------+
|trip_id|route_type|destination| createdDate|
+-------+----------+-----------+------------------+
| 1001| B| New York|2020-12-7 12:30:33|
| 1002| C| New Jersey|2020-12-7 12:30:33|
| 1003| D|Los Angeles|2020-12-7 12:30:33|
| 1004| E| Las Vegas|2020-12-7 12:30:33|
| 1005| F| Tucson|2020-12-7 12:30:33|
| 1004| E| Las Vegas|2020-12-7 12:30:38|
+-------+----------+-----------+------------------+
i get the following exception when i tried to execute the above
753927 [Executor task launch worker for task 6] ERROR
org.apache.spark.executor.Executor - Exception in task 0.0 in stage 3.0 (TID 6)
java.lang.NoSuchMethodError:
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(Lorg/apache/avro/generic/GenericRecord;Ljava/lang/String;Z)Ljava/lang/Object;
at
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:140)
at
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:139)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:394)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1334)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
at
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
at
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
753951 [task-result-getter-0] WARN
org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 3.0 (TID 6,
localhost, executor driver): java.lang.NoSuchMethodError:
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(Lorg/apache/avro/generic/GenericRecord;Ljava/lang/String;Z)Ljava/lang/Object;
at
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:140)
at
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:139)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:394)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1334)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
at
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
at
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
When i exlored the jars i found both the jars
**hudi-spark-bundle_2.11-0.6.0** and **hudi-hadoop-mr-bundle-0.6.0.jar**
contains
HoodieAvroUtils class.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]