[jira] [Resolved] (SPARK-25175) Field resolution should fail if there's ambiguity for ORC native reader

Dongjoon Hyun (JIRA) Sun, 09 Sep 2018 19:25:38 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-25175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dongjoon Hyun resolved SPARK-25175.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4.0
                   3.0.0

Issue resolved by pull request 22262
[https://github.com/apache/spark/pull/22262]

> Field resolution should fail if there's ambiguity for ORC native reader
> -----------------------------------------------------------------------
>
>                 Key: SPARK-25175
>                 URL: https://issues.apache.org/jira/browse/SPARK-25175
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.1
>            Reporter: Chenxiao Mao
>            Assignee: Chenxiao Mao
>            Priority: Major
>             Fix For: 3.0.0, 2.4.0
>
>
> SPARK-25132 adds support for case-insensitive field resolution when reading 
> from Parquet files. We found ORC files have similar issues, but not identical 
> to Parquet. Spark has two OrcFileFormat.
>  * Since SPARK-2883, Spark supports ORC inside sql/hive module with Hive 
> dependency. This hive OrcFileFormat always do case-insensitive field 
> resolution regardless of case sensitivity mode. When there is ambiguity, hive 
> OrcFileFormat always returns the first matched field, rather than failing the 
> reading operation.
>  * SPARK-20682 adds a new ORC data source inside sql/core. This native 
> OrcFileFormat supports case-insensitive field resolution, however it cannot 
> handle duplicate fields.
> Besides data source tables, hive serde tables also have issues. If ORC data 
> file has more fields than table schema, we just can't read hive serde tables. 
> If ORC data file does not have more fields, hive serde tables always do field 
> resolution by ordinal, rather than by name.
> Both ORC data source hive impl and hive serde table rely on the hive orc 
> InputFormat/SerDe to read table. I'm not sure whether we can change 
> underlying hive classes to make all orc read behaviors consistent.
> This ticket aims to make read behavior of ORC data source native impl 
> consistent with Parquet data source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25175) Field resolution should fail if there's ambiguity for ORC native reader

Reply via email to