[
https://issues.apache.org/jira/browse/SPARK-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363290#comment-14363290
]
Matthew Farrellee commented on SPARK-6245:
------------------------------------------
[~srowen] thanks for fixing this. it's nice to file a bug, go on vacation and
see it fixed when you get back!
what do you think about adding this to 1.3.1?
> jsonRDD() of empty RDD results in exception
> -------------------------------------------
>
> Key: SPARK-6245
> URL: https://issues.apache.org/jira/browse/SPARK-6245
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.2.1
> Reporter: Matthew Farrellee
> Assignee: Sean Owen
> Priority: Minor
> Fix For: 1.4.0
>
>
> converting an empty RDD to a JSON RDD results in an exception. this case is
> common when using spark streaming.
> {code}
> from pyspark import SparkContext
> from pyspark.sql import SQLContext
> sc = SparkContext()
> qsc = SQLContext(sc)
> qsc.jsonRDD(sc.parallelize([]))
> {code}
> exception:
> {noformat}
> Traceback (most recent call last):
>
> File "/tmp/bug.py", line 5, in <module>
> qsc.jsonRDD(sc.parallelize([]))
> File "/usr/share/spark/python/pyspark/sql.py", line 1605, in jsonRDD
> srdd = self._ssql_ctx.jsonRDD(jrdd.rdd(), samplingRatio)
> File
> "/usr/share/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line
> 538, in __call__
> File "/usr/share/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o27.jsonRDD.
> : java.lang.UnsupportedOperationException: empty collection
> at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:886)
> at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:886)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.reduce(RDD.scala:886)
> at org.apache.spark.sql.json.JsonRDD$.inferSchema(JsonRDD.scala:57)
> at org.apache.spark.sql.SQLContext.jsonRDD(SQLContext.scala:232)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> at py4j.Gateway.invoke(Gateway.java:259)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:207)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]