[
https://issues.apache.org/jira/browse/SPARK-41987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruifeng Zheng resolved SPARK-41987.
-----------------------------------
Resolution: Resolved
> createDataFrame supports column with map type.
> ----------------------------------------------
>
> Key: SPARK-41987
> URL: https://issues.apache.org/jira/browse/SPARK-41987
> Project: Spark
> Issue Type: Sub-task
> Components: Connect
> Affects Versions: 3.4.0
> Reporter: jiaan.geng
> Priority: Major
>
> Currently, Connect API createDataFrame does not support create dataframe with
> map type.
> For example,
> {code:java}
> >>> df = spark.createDataFrame(
> ... [(1, ["foo", "bar"], {"x": 1.0}), (2, [], {}), (3, None, None)],
> ... ("id", "an_array", "a_map")
> ... )
> {code}
> The above code want create a dataframe with column 'a_map' which is map type.
> But pyarrow recognize {"x": 1.0} as a struct not map.
> pyarrow supports map with format [('x', 1.0)]
> Because the data frame's schema is not correct, so the other sequence
> operator will be impacted.
> For example:
> {code:java}
> df.select("id", "a_map", posexplode_outer("an_array")).show()
> {code}
> Expected:
> {code:java}
> +---+----------+----+----+
> | id| a_map| pos| col|
> +---+----------+----+----+
> | 1|{x -> 1.0}| 0| foo|
> | 1|{x -> 1.0}| 1| bar|
> | 2| {}|null|null|
> | 3| null|null|null|
> +---+----------+----+----+
> {code}
> Got:
> {code:java}
> +---+------+----+----+
> | id| a_map| pos| col|
> +---+------+----+----+
> | 1| {1.0}| 0| foo|
> | 1| {1.0}| 1| bar|
> | 2|{null}|null|null|
> | 3| null|null|null|
> +---+------+----+----+
> <BLANKLINE>
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]