pete91z commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-899427179


   I am seeing this issue with MOR tables using Apache Spark 3.1.2 (Not using 
AWS EMR) and Hudi 0.7.0, is it possible to re-open please? Or is it fixed in 
0.8.0?
   
   Context: I have created a table using deltastreamer. Deltastreamer appears 
to work fine, but later when I try and create dataframe to read the table in 
pyspark I get the following error:
   
   Traceback (most recent call last):
     File "./init_hudi_for_billing_ds", line 45, in <module>
       billingDF=sqlContext.read.format("hudi").load(basePath+"/*/*")
     File 
"/home/spark_311/py1/lib64/python3.6/dist-packages/pyspark/sql/readwriter.py", 
line 204, in load
       return self._df(self._jreader.load(path))
     File 
"/home/spark_311/py1/lib64/python3.6/dist-packages/py4j/java_gateway.py", line 
1305, in __call__
       answer, self.gateway_client, self.target_id, self.name)
     File 
"/home/spark_311/py1/lib64/python3.6/dist-packages/pyspark/sql/utils.py", line 
111, in deco
       return f(*a, **kw)
     File "/home/spark_311/py1/lib64/python3.6/dist-packages/py4j/protocol.py", 
line 328, in get_return_value
       format(target_id, ".", name), value)
   py4j.protocol.Py4JJavaError: An error occurred while calling o30.load.
   : java.lang.NoSuchMethodError: 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init>(Lorg/apache/spark/sql/SparkSession;Lscala/collection/Seq;Lscala/collection/immutable/Map;Lscala/Option;Lorg/apache/spark/sql/execution/datasources/FileStatusCache;)V
        at 
org.apache.hudi.HoodieSparkUtils$.createInMemoryFileIndex(HoodieSparkUtils.scala:89)
        at 
org.apache.hudi.MergeOnReadSnapshotRelation.buildFileIndex(MergeOnReadSnapshotRelation.scala:127)
        at 
org.apache.hudi.MergeOnReadSnapshotRelation.<init>(MergeOnReadSnapshotRelation.scala:72)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:89)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:53)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
        at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
   
   I've tried setting the table.type as MERGE_ON_READ in the hudi options, but 
has no effect. These errors are not seen with COPY_ON_WRITE tables.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to