Hyukjin Kwon created SPARK-23380:
------------------------------------
Summary: Make toPandas fall back to Arrow optimization disabled
when schema is mismatched
Key: SPARK-23380
URL: https://issues.apache.org/jira/browse/SPARK-23380
Project: Spark
Issue Type: Sub-task
Components: PySpark
Affects Versions: 2.3.0
Reporter: Hyukjin Kwon
Seems we can check the schema ahead and fall back in toPandas.
Please see this case below:
{code}
df = spark.createDataFrame([[{'a': 1}]])
spark.conf.set("spark.sql.execution.arrow.enabled", "false")
df.toPandas()
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
df.toPandas()
{code}
{code}
...
py4j.protocol.Py4JJavaError: An error occurred while calling
o42.collectAsArrowToPython.
...
java.lang.UnsupportedOperationException: Unsupported data type:
map<string,bigint>
{code}
In case of {{createDataFrame}}, we fall back to make this at least working even
though the optimisation is disabled.
{code}
df = spark.createDataFrame([[{'a': 1}]])
spark.conf.set("spark.sql.execution.arrow.enabled", "false")
pdf = df.toPandas()
spark.createDataFrame(pdf).show()
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
spark.createDataFrame(pdf).show()
{code}
{code}
...
... UserWarning: Arrow will not be used in createDataFrame: Error inferring
Arrow type ...
+--------+
| _1|
+--------+
|[a -> 1]|
+--------+
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]