[ 
https://issues.apache.org/jira/browse/SPARK-30319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enrico Minack updated SPARK-30319:
----------------------------------
    Fix Version/s:     (was: 3.0.0)

> Adds a stricter version of as[T]
> --------------------------------
>
>                 Key: SPARK-30319
>                 URL: https://issues.apache.org/jira/browse/SPARK-30319
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.4.4
>            Reporter: Enrico Minack
>            Priority: Major
>
> The behaviour of as[T] is not intuitive when you read code like 
> df.as[T].write.csv("data.csv"). The result depends on the actual schema of 
> df, where def as[T](): Dataset[T] should be agnostic to the schema of df. The 
> expected behaviour is not provided elsewhere:
>  * Extra columns that are not part of the type {{T}} are not dropped.
>  * Order of columns is not aligned with schema of {{T}}.
>  * Columns are not cast to the types of {{T}}'s fields. They have to be cast 
> explicitly.
> A method that enforces schema of T on a given Dataset would be very 
> convenient and allows to articulate and guarantee above assumptions about 
> your data with the native Spark Dataset API. This method plays a more 
> explicit and enforcing role than as[T] with respect to columns, column order 
> and column type.
> Possible naming of a stricter version of {{as[T]}}:
>  * {{as[T](strict = true)}}
>  * {{toDS[T]}} (as in {{toDF}})
>  * {{selectAs[T]}} (as this is merely selecting the columns of schema {{T}})
> The naming {{toDS[T]}} is chosen here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to