[jira] [Resolved] (SPARK-27269) File source v2 should validate data schema only

Hyukjin Kwon (JIRA) Tue, 26 Mar 2019 16:00:27 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-27269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon resolved SPARK-27269.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 24203
[https://github.com/apache/spark/pull/24203]

> File source v2 should validate data schema only
> -----------------------------------------------
>
>                 Key: SPARK-27269
>                 URL: https://issues.apache.org/jira/browse/SPARK-27269
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Assignee: Gengliang Wang
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Currently, File source v2 allows each data source to specify the supported 
> data types by implementing the method `supportsDataType` in `FileScan` and 
> `FileWriteBuilder`.
> However, in the read path, the validation checks all the data types in 
> `readSchema`, which might contain partition columns.  This is actually a 
> regression. E.g. Text data source only supports String data type, while the 
> partition columns can still contain Integer type since partition columns are 
> processed by Spark.
> This PR is to:
> 1. Refactor schema validation and check data schema only
> 2. Filter the partition columns in data schema if user specified schema 
> provided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-27269) File source v2 should validate data schema only

Reply via email to