rahultoall opened a new issue #2302:
URL: https://github.com/apache/hudi/issues/2302


   hi,
   
   I am facing issue when i try to sync my hudi table to hive using the spark 
DataSource Api. 
   Spark version - 2.4.7
   spark-avro - spark-avro_2.11-2.4.7
   hudi-spark - hudi-spark-bundle_2.11-0.6.0
   
   i have set following properties in spark conf as well
   spark.serializer = org.apache.spark.serializer.KryoSerializer
   spark.sql.hive.convertMetastoreParquet = false
   
   Also i have added the **hudi-hadoop-mr-bundle-0.6.0.jar** in Hive's aux-path 
   
   following is the snippet i used to write a dataframe to hudi syncing to hive
   
    df.write.format("hudi")
         .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "trip_id")
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "createdDate")
         .option(HoodieWriteConfig.TABLE_NAME, "trips_hive")
         .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY,trips_hive)
         .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY,"true")
         
.option(DataSourceWriteOptions.HIVE_URL_OPT_KEY,"jdbc:hive2://<ip>:10000")
         .option(DataSourceWriteOptions.HIVE_USER_OPT_KEY,"<hive_username>")
         .option(DataSourceWriteOptions.HIVE_PASS_OPT_KEY,"<hive_password>")
    
.option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY,"org.apache.hudi.hive.NonPartitionedExtractor")
         
.option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY,"org.apache.hudi.keygen.NonpartitionedKeyGenerator")
         .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
         .option("hoodie.upsert.shuffle.parallelism", "4")
         .mode(SaveMode.Overwrite)
         .save(<hdfs path>)
   
   The dataframe that i tried to insert is 
   
   +-------+----------+-----------+------------------+
   |trip_id|route_type|destination|       createdDate|
   +-------+----------+-----------+------------------+
   |   1001|         B|   New York|2020-12-7 12:30:33|
   |   1002|         C| New Jersey|2020-12-7 12:30:33|
   |   1003|         D|Los Angeles|2020-12-7 12:30:33|
   |   1004|         E|  Las Vegas|2020-12-7 12:30:33|
   |   1005|         F|     Tucson|2020-12-7 12:30:33|
   |   1004|         E|  Las Vegas|2020-12-7 12:30:38|
   +-------+----------+-----------+------------------+
   
   i get the following exception when i tried to execute the above 
   
   753927 [Executor task launch worker for task 6] ERROR 
org.apache.spark.executor.Executor  - Exception in task 0.0 in stage 3.0 (TID 6)
   java.lang.NoSuchMethodError: 
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(Lorg/apache/avro/generic/GenericRecord;Ljava/lang/String;Z)Ljava/lang/Object;
           at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:140)
           at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:139)
           at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:394)
           at scala.collection.Iterator$class.foreach(Iterator.scala:891)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
           at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
           at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
           at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
           at 
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
           at scala.collection.AbstractIterator.to(Iterator.scala:1334)
           at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
           at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
           at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
           at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
           at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
           at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:123)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
           at java.lang.Thread.run(Thread.java:745)
   753951 [task-result-getter-0] WARN  
org.apache.spark.scheduler.TaskSetManager  - Lost task 0.0 in stage 3.0 (TID 6, 
localhost, executor driver): java.lang.NoSuchMethodError: 
org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(Lorg/apache/avro/generic/GenericRecord;Ljava/lang/String;Z)Ljava/lang/Object;
           at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:140)
           at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:139)
           at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:394)
           at scala.collection.Iterator$class.foreach(Iterator.scala:891)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
           at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
           at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
           at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
           at 
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
           at scala.collection.AbstractIterator.to(Iterator.scala:1334)
           at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
           at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
           at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
           at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
           at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
           at 
org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$31.apply(RDD.scala:1409)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:123)
           at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
           at java.lang.Thread.run(Thread.java:745)
   
   
   When i exlored the jars i found both the jars 
**hudi-spark-bundle_2.11-0.6.0** and **hudi-hadoop-mr-bundle-0.6.0.jar** 
contains 
   HoodieAvroUtils class. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to