Nic Eggert created SPARK-15705:
----------------------------------
Summary: Spark won't read ORC schema from metastore for
partitioned tables
Key: SPARK-15705
URL: https://issues.apache.org/jira/browse/SPARK-15705
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.0.0
Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
Reporter: Nic Eggert
Spark does not seem to read the schema from the Hive metastore for partitioned
tables stored as ORC files. It appears to read the schema from the files
themselves, which, if they were created with Hive, does not match the metastore
schema (at least not before before Hive 2.0, see HIVE-4243). To reproduce:
In Hive:
{code}
hive> create table default.test (id BIGINT, name STRING) partitioned by (state
STRING) stored as orc;
hive> insert into table default.test partition (state="CA") values (1, "mike"),
(2, "steve"), (3, "bill");
{code}
In Spark
{code}
scala> spark.table("default.test").printSchema
{code}
Expected result: Spark should preserve the column names that were defined in
Hive.
Actual Result:
{code}
root
|-- _col0: long (nullable = true)
|-- _col1: string (nullable = true)
|-- state: string (nullable = true)
{code}
Possibly related to SPARK-14959?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]