Hurshal Patel created SPARK-12348:
-------------------------------------

             Summary: PySpark _inferSchema crashes with incorrect exception on 
an empty RDD
                 Key: SPARK-12348
                 URL: https://issues.apache.org/jira/browse/SPARK-12348
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.5.0
            Reporter: Hurshal Patel
            Priority: Minor


{code:python}
>>> rdd = sc.emptyRDD()
>>> df = sqlContext.createDataFrame(rdd)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/memsql/spark/python/pyspark/sql/context.py", line 404, in 
createDataFrame
    rdd, schema = self._createFromRDD(data, schema, samplingRatio)
  File "/home/memsql/spark/python/pyspark/sql/context.py", line 285, in 
_createFromRDD
    struct = self._inferSchema(rdd, samplingRatio)
  File "/home/memsql/spark/python/pyspark/sql/context.py", line 229, in 
_inferSchema
    first = rdd.first()
  File "/home/memsql/spark/python/pyspark/rdd.py", line 1320, in first
    raise ValueError("RDD is empty")
ValueError: RDD is empty
{code}
throws "RDD is empty" in rdd.first() instead of the intended message "The first 
row in RDD is empty, can not infer schema" in sqlContext._inferSchema



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to