[GitHub] [hudi] fripple commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

GitBox Tue, 26 Jan 2021 07:35:04 -0800


fripple commented on issue #2013:
URL: https://github.com/apache/hudi/issues/2013#issuecomment-767624984



   @vinothchandar Will this work with a table created using an older version of 
hudi?
   
   When I try to do this using 0.7.0 (and spark 3.0.0 on emr 6.1, and 
spark-avro_2.12-3.0.0), I get the following error:
   incremental_read_options = {
       'hoodie.datasource.query.type': 'incremental',
       'hoodie.datasource.read.begin.instanttime': beginTime - 1
   }
   incremental = spark.read.format("org.apache.hudi"). \
       options(**incremental_read_options). \
       load(basePath)
   
   
   An error occurred while calling o127.load.
   : java.lang.NoSuchMethodError: 
org.apache.spark.sql.execution.datasources.PartitionedFile.<init>(Lorg/apache/spark/sql/catalyst/InternalRow;Ljava/lang/String;JJ[Ljava/lang/String;)V
        at 
org.apache.hudi.MergeOnReadIncrementalRelation.$anonfun$buildFileIndex$7(MergeOnReadIncrementalRelation.scala:195)
        at scala.collection.immutable.List.map(List.scala:286)
        at 
org.apache.hudi.MergeOnReadIncrementalRelation.buildFileIndex(MergeOnReadIncrementalRelation.scala:189)
        at 
org.apache.hudi.MergeOnReadIncrementalRelation.<init>(MergeOnReadIncrementalRelation.scala:80)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:99)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:53)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:279)
        at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:268)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:268)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:214)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
   
   Traceback (most recent call last):
     File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", 
line 178, in load
       return self._df(self._jreader.load(path))
     File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", 
line 1305, in __call__
       answer, self.gateway_client, self.target_id, self.name)
     File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 
131, in deco
       return f(*a, **kw)
     File "/usr/lib/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", 
line 328, in get_return_value
       format(target_id, ".", name), value)
   py4j.protocol.Py4JJavaError: An error occurred while calling o127.load.
   : java.lang.NoSuchMethodError: 
org.apache.spark.sql.execution.datasources.PartitionedFile.<init>(Lorg/apache/spark/sql/catalyst/InternalRow;Ljava/lang/String;JJ[Ljava/lang/String;)V
        at 
org.apache.hudi.MergeOnReadIncrementalRelation.$anonfun$buildFileIndex$7(MergeOnReadIncrementalRelation.scala:195)
        at scala.collection.immutable.List.map(List.scala:286)
        at 
org.apache.hudi.MergeOnReadIncrementalRelation.buildFileIndex(MergeOnReadIncrementalRelation.scala:189)
        at 
org.apache.hudi.MergeOnReadIncrementalRelation.<init>(MergeOnReadIncrementalRelation.scala:80)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:99)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:53)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:279)
        at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:268)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:268)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:214)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] fripple commented on issue #2013: [SUPPORT] MoR tables SparkDataSource Incremental Querys

Reply via email to