[ 
https://issues.apache.org/jira/browse/SPARK-49690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883148#comment-17883148
 ] 

Asif commented on SPARK-49690:
------------------------------

Not sure if this is a bug or expected behaviour.

Given that a UDT can be represented in terms of native spark data types, it 
seems odd to see in the schema , a UDT type and also the Row object containing 
the UDT instance .

But it seems ml modules expect the row object of data frame to contain UDT 
Instance.

This seems to go against the dataframe behaviour , as UDT Instance should be 
tied to a DataSet

> UDT type is not expanded into its StructType in the schema definition
> ---------------------------------------------------------------------
>
>                 Key: SPARK-49690
>                 URL: https://issues.apache.org/jira/browse/SPARK-49690
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Asif
>            Priority: Major
>              Labels: SQL, pull-request-available
>
> A UDT type field does not show up as constituent struct type in the schema. 
> Instead it shows up as UserDefinedType class .
> For eg: it shows up as
> root
>  |-- message: test (nullable = true)
>  
> But mesage is a field of type UDT with schema represented as 
> StructField("intField", IntegerType, nullable = false),
> StructField("stringField", StringType, nullable = false)))
> so message field should be a struct type, with schema as
> root
>  |-- message: struct (nullable = true)
>  |    |-- intField: integer (nullable = false)
>  |    |-- stringField: string (nullable = false)
>  
> will be opening a PR and along with bug test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to