[GitHub] [hudi] luffyd commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

GitBox Fri, 04 Sep 2020 14:21:40 -0700


luffyd commented on pull request #1848:
URL: https://github.com/apache/hudi/pull/1848#issuecomment-687392285



   > > @garyli1019 : I took this patch and ran it in EMR (Spark-2.4.5-amzn-0). 
I got the following exceptions when loading S3 dataset.
   > > I am using hudi-spark-bundle in the spark session.
   > > Setting default log level to "WARN".
   > > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   > > 20/07/29 06:57:50 WARN Client: Neither spark.yarn.jars nor 
spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
   > > Spark context Web UI available at 
http://ip-172-31-33-232.us-east-2.compute.internal:4040
   > > Spark context available as 'sc' (master = yarn, app id = 
application_1595775804042_9837).
   > > Spark session available as 'spark'.
   > > Welcome to
   > > 
   > > / **/** ___ _****/ /** _\ / _ / _ `/ __/ '_/ /**_/ .__/_,_/_/ /_/_\ 
version 2.4.5-amzn-0 /_/
   > > Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_252)
   > > Type in expressions to have them evaluated.
   > > Type :help for more information.
   > > scala> val dfh = 
spark.read.format("hudi").load("s3a://hudi.streaming.perf/orders_stream_hudi_mor_4/_/_")
   > > java.lang.NoSuchMethodError: 
org.apache.spark.sql.execution.datasources.PartitionedFile.(Lorg/apache/spark/sql/catalyst/InternalRow;Ljava/lang/String;JJ[Ljava/lang/String;)V
   > > at 
org.apache.hudi.MergeOnReadSnapshotRelation$$anonfun$4.apply(MergeOnReadSnapshotRelation.scala:144)
   > > at 
org.apache.hudi.MergeOnReadSnapshotRelation$$anonfun$4.apply(MergeOnReadSnapshotRelation.scala:141)
   > > at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
   > > at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
   > > at scala.collection.Iterator$class.foreach(Iterator.scala:891)
   > > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
   > > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
   > > at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
   > > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
   > > at scala.collection.AbstractTraversable.map(Traversable.scala:104)
   > > at 
org.apache.hudi.MergeOnReadSnapshotRelation.buildFileIndex(MergeOnReadSnapshotRelation.scala:141)
   > > at 
org.apache.hudi.MergeOnReadSnapshotRelation.(MergeOnReadSnapshotRelation.scala:75)
   > > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:70)
   > > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:50)
   > > at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
   > > at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
   > > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
   > > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
   > > ... 49 elided
   > > scala>
   > > Have you seen this issue before ?
   > 
   > I am seeing this exception with latest Hudi, how do we get it resolved?
   
   I see Hudi uses spark-sql_2.11-2.4.4.jar and EMR uses 
/usr/lib/spark/jars/spark-sql_2.11-2.4.5-amzn-0.jar
   Not sure If it makes a difference for that PartitionedFile class


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] luffyd commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

Reply via email to