[ 
https://issues.apache.org/jira/browse/SPARK-23418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363127#comment-16363127
 ] 

Apache Spark commented on SPARK-23418:
--------------------------------------

User 'rdblue' has created a pull request for this issue:
https://github.com/apache/spark/pull/20603

> DataSourceV2 should not allow userSpecifiedSchema without 
> ReadSupportWithSchema
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-23418
>                 URL: https://issues.apache.org/jira/browse/SPARK-23418
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Ryan Blue
>            Priority: Major
>
> DataSourceV2 currently does not reject user-specified schemas when a source 
> does not implement ReadSupportWithSchema. This is confusing behavior. Here's 
> a quote from a discussion on SPARK-23203:
> {quote}I think this will cause confusion when source schemas change. Also, I 
> can't think of a situation where it is a good idea to pass a schema that is 
> ignored.
> Here's an example of how this will be confusing: think of a job that supplies 
> a schema identical to the table's schema and runs fine, so it goes into 
> production. What happens when the table's schema changes? If someone adds a 
> column to the table, then the job will start failing and report that the 
> source doesn't support user-supplied schemas, even though it had previously 
> worked just fine with a user-supplied schema. In addition, the change to the 
> table is actually compatible with the old job because the new column will be 
> removed by a projection.
> To fix this situation, it may be tempting to use the user-supplied schema as 
> an initial projection. But that doesn't make sense because we don't need two 
> projection mechanisms. If we used this as a second way to project, it would 
> be confusing that you can't actually leave out columns (at least for CSV) and 
> it would be odd that using this path you can coerce types, which should 
> usually be done by Spark.
> I think it is best not to allow a user-supplied schema when it isn't 
> supported by a source.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to