David Vogelbacher created SPARK-27778:
-----------------------------------------
Summary: toPandas with arrow enabled fails for DF with no partition
Key: SPARK-27778
URL: https://issues.apache.org/jira/browse/SPARK-27778
Project: Spark
Issue Type: Bug
Components: PySpark, SQL
Affects Versions: 3.0.0
Reporter: David Vogelbacher
Calling to pandas with {{spark.sql.execution.arrow.enabled: true}} fails for
dataframes with no partitions. The error is a {{EOFError}}. With
{{spark.sql.execution.arrow.enabled: false}} the conversion.
Repro (on current master branch):
{noformat}
>>> from pyspark.sql.types import *
>>> schema = StructType([StructField("field1", StringType(), True)])
>>> df = spark.createDataFrame(sc.emptyRDD(), schema)
>>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
>>> df.toPandas()
/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py:2162:
UserWarning: toPandas attempted Arrow optimization because
'spark.sql.execution.arrow.enabled' is set to true, but has reached the error
below and can not continue. Note that
'spark.sql.execution.arrow.fallback.enabled' does not have an effect on
failures in the middle of computation.
warnings.warn(msg)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py", line
2143, in toPandas
batches = self._collectAsArrow()
File "/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py", line
2205, in _collectAsArrow
results = list(_load_from_socket(sock_info, ArrowCollectSerializer()))
File "/Users/dvogelbacher/git/spark/python/pyspark/serializers.py", line 210,
in load_stream
num = read_int(stream)
File "/Users/dvogelbacher/git/spark/python/pyspark/serializers.py", line 810,
in read_int
raise EOFError
EOFError
>>> spark.conf.set("spark.sql.execution.arrow.enabled", "false")
>>> df.toPandas()
Empty DataFrame
Columns: [field1]
Index: []
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]