[GitHub] [hudi] zafer-sahin commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

GitBox Fri, 29 Jan 2021 02:30:24 -0800


zafer-sahin commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-769721260



   Hi, I am still getting a similar error.
   
   
   
   >>> hudi_options_insert = {
   ...     "hoodie.table.name": "the_table_name",
   ...     "hoodie.datasource.write.storage.type": "MERGE_ON_READ",
   ...     "hoodie.datasource.write.table.type": "MERGE_ON_READ",
   ...     "hoodie.datasource.write.recordkey.field": "id",
   ...     "hoodie.datasource.write.operation": "bulk_insert",
   ...     "hoodie.datasource.write.partitionpath.field": "ds",
   ...     "hoodie.datasource.write.precombine.field": "id",
   ...     "hoodie.insert.shuffle.parallelism": 135
   ...     }
   >>> 
df.write.format("hudi").options(**hudi_options_insert).mode("overwrite").save(S3_MERGE_ON_READ)
   
   
   `Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 178, in load
       return self._df(self._jreader.load(path))
     File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", 
line 1305, in __call__
     File "/usr/lib/spark/python/pyspark/sql/utils.py", line 128, in deco
       return f(*a, **kw)
     File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", 
line 328, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o87.load.
   : java.lang.NoSuchMethodError: 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init>(Lorg/apache/spark/sql/SparkSession;Lscala/collection/Seq;Lscala/collection/immutable/Map;Lscala/Option;Lorg/apache/spark/sql/execution/datasources/FileStatusCache;)V
        at 
org.apache.hudi.HoodieSparkUtils$.createInMemoryFileIndex(HoodieSparkUtils.scala:89)
        at 
org.apache.hudi.MergeOnReadSnapshotRelation.buildFileIndex(MergeOnReadSnapshotRelation.scala:127)
        at 
org.apache.hudi.MergeOnReadSnapshotRelation.<init>(MergeOnReadSnapshotRelation.scala:72)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:89)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:53)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
        at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:232)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] zafer-sahin commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

Reply via email to