[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

Nic Eggert (JIRA) Wed, 22 Jun 2016 12:43:06 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345036#comment-15345036
 ]


Nic Eggert commented on SPARK-15705:
------------------------------------

Raised priority to critical. We have a lot of partitioned tables stored as ORC, 
so dataframes/datasets in  2.0 are effectively useless to us until this is 
fixed.

> Spark won't read ORC schema from metastore for partitioned tables
> -----------------------------------------------------------------
>
>                 Key: SPARK-15705
>                 URL: https://issues.apache.org/jira/browse/SPARK-15705
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>            Reporter: Nic Eggert
>            Priority: Critical
>
> Spark does not seem to read the schema from the Hive metastore for 
> partitioned tables stored as ORC files. It appears to read the schema from 
> the files themselves, which, if they were created with Hive, does not match 
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To 
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by 
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, 
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in 
> Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

Reply via email to