[
https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345036#comment-15345036
]
Nic Eggert commented on SPARK-15705:
------------------------------------
Raised priority to critical. We have a lot of partitioned tables stored as ORC,
so dataframes/datasets in 2.0 are effectively useless to us until this is
fixed.
> Spark won't read ORC schema from metastore for partitioned tables
> -----------------------------------------------------------------
>
> Key: SPARK-15705
> URL: https://issues.apache.org/jira/browse/SPARK-15705
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
> Reporter: Nic Eggert
> Priority: Critical
>
> Spark does not seem to read the schema from the Hive metastore for
> partitioned tables stored as ORC files. It appears to read the schema from
> the files themselves, which, if they were created with Hive, does not match
> the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To
> reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by
> (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1,
> "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in
> Hive.
> Actual Result:
> {code}
> root
> |-- _col0: long (nullable = true)
> |-- _col1: string (nullable = true)
> |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]