[ https://issues.apache.org/jira/browse/SPARK-47543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-47543. ---------------------------------- Assignee: Haejoon Lee Resolution: Fixed Fixed in https://github.com/apache/spark/pull/45699 > Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame > creation. > -------------------------------------------------------------------------------- > > Key: SPARK-47543 > URL: https://issues.apache.org/jira/browse/SPARK-47543 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark > Affects Versions: 4.0.0 > Reporter: Haejoon Lee > Assignee: Haejoon Lee > Priority: Major > Labels: pull-request-available > > Currently the PyArrow infers the Pandas dictionary field as StructType > instead of MapType, so Spark can't handle the schema properly: > {code:java} > >>> pdf = pd.DataFrame({"str_col": ['second'], "dict_col": [{'first': 0.7, > >>> 'second': 0.3}]}) > >>> pa.Schema.from_pandas(pdf) > str_col: string > dict_col: struct<first: double, second: double> > child 0, first: double > child 1, second: double > {code} > We cannot handle this case since we use PyArrow for schema creation. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org