[
https://issues.apache.org/jira/browse/SPARK-23418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363127#comment-16363127
]
Apache Spark commented on SPARK-23418:
--------------------------------------
User 'rdblue' has created a pull request for this issue:
https://github.com/apache/spark/pull/20603
> DataSourceV2 should not allow userSpecifiedSchema without
> ReadSupportWithSchema
> -------------------------------------------------------------------------------
>
> Key: SPARK-23418
> URL: https://issues.apache.org/jira/browse/SPARK-23418
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 2.3.0
> Reporter: Ryan Blue
> Priority: Major
>
> DataSourceV2 currently does not reject user-specified schemas when a source
> does not implement ReadSupportWithSchema. This is confusing behavior. Here's
> a quote from a discussion on SPARK-23203:
> {quote}I think this will cause confusion when source schemas change. Also, I
> can't think of a situation where it is a good idea to pass a schema that is
> ignored.
> Here's an example of how this will be confusing: think of a job that supplies
> a schema identical to the table's schema and runs fine, so it goes into
> production. What happens when the table's schema changes? If someone adds a
> column to the table, then the job will start failing and report that the
> source doesn't support user-supplied schemas, even though it had previously
> worked just fine with a user-supplied schema. In addition, the change to the
> table is actually compatible with the old job because the new column will be
> removed by a projection.
> To fix this situation, it may be tempting to use the user-supplied schema as
> an initial projection. But that doesn't make sense because we don't need two
> projection mechanisms. If we used this as a second way to project, it would
> be confusing that you can't actually leave out columns (at least for CSV) and
> it would be odd that using this path you can coerce types, which should
> usually be done by Spark.
> I think it is best not to allow a user-supplied schema when it isn't
> supported by a source.
> {quote}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]