[ https://issues.apache.org/jira/browse/SPARK-34805 ]


    Joost Farla deleted comment on SPARK-34805:
    -------------------------------------

was (Author: JIRAUSER295969):
[~cloud_fan] I was running into the exact same issue using Spark v3.3.0. It 
looks like the fix was merged into the 3.3 branch (on March 21st), but was not 
yet released as part of v3.3. It is also not mentioned in the release notes. Is 
that possible? Thanks in advance!

> PySpark loses metadata in DataFrame fields when selecting nested columns
> ------------------------------------------------------------------------
>
>                 Key: SPARK-34805
>                 URL: https://issues.apache.org/jira/browse/SPARK-34805
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.0.1, 3.1.1
>            Reporter: Mark Ressler
>            Priority: Major
>             Fix For: 3.3.0
>
>         Attachments: jsonMetadataTest.py, nested_columns_metadata.scala
>
>
> For a DataFrame schema with nested StructTypes, where metadata is set for 
> fields in the schema, that metadata is lost when a DataFrame selects nested 
> fields.  For example, suppose
> {code:java}
> df.schema.fields[0].dataType.fields[0].metadata
> {code}
> returns a non-empty dictionary, then
> {code:java}
> df.select('Field0.SubField0').schema.fields[0].metadata{code}
> returns an empty dictionary, where "Field0" is the name of the first field in 
> the DataFrame and "SubField0" is the name of the first nested field under 
> "Field0".
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to