Nic Eggert created SPARK-15705:
----------------------------------

             Summary: Spark won't read ORC schema from metastore for 
partitioned tables
                 Key: SPARK-15705
                 URL: https://issues.apache.org/jira/browse/SPARK-15705
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
         Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
            Reporter: Nic Eggert


Spark does not seem to read the schema from the Hive metastore for partitioned 
tables stored as ORC files. It appears to read the schema from the files 
themselves, which, if they were created with Hive, does not match the metastore 
schema (at least not before before Hive 2.0, see HIVE-4243). To reproduce:

In Hive:
{code}
hive> create table default.test (id BIGINT, name STRING) partitioned by (state 
STRING) stored as orc;
hive> insert into table default.test partition (state="CA") values (1, "mike"), 
(2, "steve"), (3, "bill");
{code}

In Spark
{code}
scala> spark.table("default.test").printSchema
{code}

Expected result: Spark should preserve the column names that were defined in 
Hive.

Actual Result:
{code}
root
 |-- _col0: long (nullable = true)
 |-- _col1: string (nullable = true)
 |-- state: string (nullable = true)
{code}

Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to