Haejoon Lee created SPARK-47543:
-----------------------------------
Summary: Inferring `dict` as `MapType` from Pandas DataFrame to
allow DataFrame creation.
Key: SPARK-47543
URL: https://issues.apache.org/jira/browse/SPARK-47543
Project: Spark
Issue Type: Bug
Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Haejoon Lee
Currently the PyArrow infers the Pandas dictionary field as StructType instead
of MapType, so Spark can't handle the schema properly:
{code:java}
>>> pdf = pd.DataFrame({"str_col": ['second'], "dict_col": [{'first': 0.7,
>>> 'second': 0.3}]})
>>> pa.Schema.from_pandas(pdf)
str_col: string
dict_col: struct<first: double, second: double>
child 0, first: double
child 1, second: double
{code}
We cannot handle this case since we use PyArrow for schema creation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]