[GitHub] [hudi] fireking77 opened a new issue #4340: [SUPPORT]

GitBox Thu, 16 Dec 2021 05:14:06 -0800


fireking77 opened a new issue #4340:
URL: https://github.com/apache/hudi/issues/4340



   Hi Guys!
   
   I use HUDI with the following setup:
    AWS EMR 6.4 HUDI 0.8, PySpark 3.1.2
   
   The problem is when you try to read from a HUDI dataset incrementaly
   
   `
   df = (spark.read
                .format('org.apache.hudi')
                .options(**{
                        "hoodie.datasource.query.type": "incremental",
                        "hoodie.datasource.read.begin.instanttime": 
<start_instant_time>,
                        "hoodie.datasource.read.end.instanttime": 
<stop_instant_time>,
                })
                .load(base_path_ignition + partition_pattern_telemetry))
   `
   
   it fails: when the particular time zone is empty
   `
   Traceback (most recent call last):
     File 
"/tmp/pycharm_project_151/aggregates_by_ignition/ignition_aggregate/ignition-aggregate.py",
 line 510, in <module>
       spark_dag()
     File 
"/tmp/pycharm_project_151/aggregates_by_ignition/ignition_aggregate/ignition-aggregate.py",
 line 95, in spark_dag
       .load(base_path_ignition + partition_pattern_telemetry)
     File "/usr/local/lib/python3.7/site-packages/pyspark/sql/readwriter.py", 
line 204, in load
       return self._df(self._jreader.load(path))
     File "/usr/local/lib/python3.7/site-packages/py4j/java_gateway.py", line 
1305, in __call__
       answer, self.gateway_client, self.target_id, self.name)
     File "/usr/local/lib/python3.7/site-packages/pyspark/sql/utils.py", line 
111, in deco
       return f(*a, **kw)
     File "/usr/local/lib/python3.7/site-packages/py4j/protocol.py", line 328, 
in get_return_value
       format(target_id, ".", name), value)
   py4j.protocol.Py4JJavaError: An error occurred while calling o221.load.
   : java.util.NoSuchElementException: No value present in Option
        at org.apache.hudi.common.util.Option.get(Option.java:88)
        at 
org.apache.hudi.MergeOnReadIncrementalRelation.buildFileIndex(MergeOnReadIncrementalRelation.scala:173)
        at 
org.apache.hudi.MergeOnReadIncrementalRelation.<init>(MergeOnReadIncrementalRelation.scala:79)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:109)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:63)
        at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
        at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
        at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
   
   
   Process finished with exit code 1
   `
   
   Ofcourse it can be handled, but it woul be better to return an empty 
dataframe, with 0 partion
    new_ignition_inc_df.rdd.getNumPartitions() == 0
   or 
    a meaningfull exception (not this one: : java.util.NoSuchElementException: 
No value present in Option)
   
   Thanks,
    Darvi
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] fireking77 opened a new issue #4340: [SUPPORT]

Reply via email to