[ 
https://issues.apache.org/jira/browse/SPARK-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363290#comment-14363290
 ] 

Matthew Farrellee commented on SPARK-6245:
------------------------------------------

[~srowen] thanks for fixing this. it's nice to file a bug, go on vacation and 
see it fixed when you get back!

what do you think about adding this to 1.3.1?

> jsonRDD() of empty RDD results in exception
> -------------------------------------------
>
>                 Key: SPARK-6245
>                 URL: https://issues.apache.org/jira/browse/SPARK-6245
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.1
>            Reporter: Matthew Farrellee
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 1.4.0
>
>
> converting an empty RDD to a JSON RDD results in an exception. this case is 
> common when using spark streaming.
> {code}
> from pyspark import SparkContext
> from pyspark.sql import SQLContext
> sc = SparkContext()
> qsc = SQLContext(sc)
> qsc.jsonRDD(sc.parallelize([]))
> {code}
> exception:
> {noformat}
> Traceback (most recent call last):                                            
>                                   
>   File "/tmp/bug.py", line 5, in <module>
>     qsc.jsonRDD(sc.parallelize([]))
>   File "/usr/share/spark/python/pyspark/sql.py", line 1605, in jsonRDD
>     srdd = self._ssql_ctx.jsonRDD(jrdd.rdd(), samplingRatio)
>   File 
> "/usr/share/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 
> 538, in __call__
>   File "/usr/share/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", 
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o27.jsonRDD.
> : java.lang.UnsupportedOperationException: empty collection
>       at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:886)
>       at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:886)
>       at scala.Option.getOrElse(Option.scala:120)
>       at org.apache.spark.rdd.RDD.reduce(RDD.scala:886)
>       at org.apache.spark.sql.json.JsonRDD$.inferSchema(JsonRDD.scala:57)
>       at org.apache.spark.sql.SQLContext.jsonRDD(SQLContext.scala:232)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>       at py4j.Gateway.invoke(Gateway.java:259)
>       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>       at py4j.commands.CallCommand.execute(CallCommand.java:79)
>       at py4j.GatewayConnection.run(GatewayConnection.java:207)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to