pete91z commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-899427179
I am seeing this issue with MOR tables using Apache Spark 3.1.2 (Not using
AWS EMR) and Hudi 0.7.0, is it possible to re-open please? Or is it fixed in
0.8.0?
Context: I have created a table using deltastreamer. Deltastreamer appears
to work fine, but later when I try and create dataframe to read the table in
pyspark I get the following error:
Traceback (most recent call last):
File "./init_hudi_for_billing_ds", line 45, in <module>
billingDF=sqlContext.read.format("hudi").load(basePath+"/*/*")
File
"/home/spark_311/py1/lib64/python3.6/dist-packages/pyspark/sql/readwriter.py",
line 204, in load
return self._df(self._jreader.load(path))
File
"/home/spark_311/py1/lib64/python3.6/dist-packages/py4j/java_gateway.py", line
1305, in __call__
answer, self.gateway_client, self.target_id, self.name)
File
"/home/spark_311/py1/lib64/python3.6/dist-packages/pyspark/sql/utils.py", line
111, in deco
return f(*a, **kw)
File "/home/spark_311/py1/lib64/python3.6/dist-packages/py4j/protocol.py",
line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o30.load.
: java.lang.NoSuchMethodError:
org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init>(Lorg/apache/spark/sql/SparkSession;Lscala/collection/Seq;Lscala/collection/immutable/Map;Lscala/Option;Lorg/apache/spark/sql/execution/datasources/FileStatusCache;)V
at
org.apache.hudi.HoodieSparkUtils$.createInMemoryFileIndex(HoodieSparkUtils.scala:89)
at
org.apache.hudi.MergeOnReadSnapshotRelation.buildFileIndex(MergeOnReadSnapshotRelation.scala:127)
at
org.apache.hudi.MergeOnReadSnapshotRelation.<init>(MergeOnReadSnapshotRelation.scala:72)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:89)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:53)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
at
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
at
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
I've tried setting the table.type as MERGE_ON_READ in the hudi options, but
has no effect. These errors are not seen with COPY_ON_WRITE tables.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]