[jira] [Resolved] (SPARK-16628) OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files

Wenchen Fan (JIRA) Fri, 13 Oct 2017 08:26:55 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-16628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wenchen Fan resolved SPARK-16628.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.3.0
                   2.2.1

Issue resolved by pull request 19470
[https://github.com/apache/spark/pull/19470]

> OrcConversions should not convert an ORC table represented by 
> MetastoreRelation to HadoopFsRelation if metastore schema does not match 
> schema stored in ORC files
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-16628
>                 URL: https://issues.apache.org/jira/browse/SPARK-16628
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Yin Huai
>             Fix For: 2.2.1, 2.3.0
>
>
> When {{spark.sql.hive.convertMetastoreOrc}} is enabled, we will convert a ORC 
> table represented by a MetastoreRelation to HadoopFsRelation that uses 
> Spark's OrcFileFormat internally. This conversion aims to make table scanning 
> have a better performance since at runtime, the code path to scan 
> HadoopFsRelation's performance is better. However, OrcFileFormat's 
> implementation is based on the assumption that ORC files store their schema 
> with correct column names. However, before Hive 2.0, an ORC table created by 
> Hive does not store column name correctly in the ORC files (HIVE-4243). So, 
> for this kind of ORC datasets, we cannot really convert the code path. 
> Right now, if ORC tables are created by Hive 1.x or 0.x, enabling 
> {{spark.sql.hive.convertMetastoreOrc}} will introduce a runtime exception for 
> non-partitioned ORC tables and drop the metastore schema for partitioned ORC 
> tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-16628) OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files

Reply via email to