[ 
https://issues.apache.org/jira/browse/SPARK-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940763#comment-14940763
 ] 

Yin Huai commented on SPARK-9762:
---------------------------------

[~simeons] Different versions of Hive have different kinds internal 
restrictions. It is not always possible to store the metadata in a hive 
compatible way. For example, if the metastore uses Hive 0.13, Hive will rejects 
the create table call of a parquet table if columns have a binary or a decimal 
one. So, to save the table's metadata, we have to workaround it and save it in 
a way that is not compatible with hive. 

The reason that you see two different output for describe table and show 
columns is that spark sql has implemented describe table but we still delegate 
show columns command to Hive. Because the metadata is not hive compatible, show 
columns command gives you a different output.

We have been gradually adding more coverage on native support of different 
kinds of commands. If there is any specific commands that are important to your 
use cases, please feel free to create jiras. 

> ALTER TABLE cannot find column
> ------------------------------
>
>                 Key: SPARK-9762
>                 URL: https://issues.apache.org/jira/browse/SPARK-9762
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.1
>         Environment: Ubuntu on AWS
>            Reporter: Simeon Simeonov
>
> {{ALTER TABLE tbl CHANGE}} cannot find a column that {{DESCRIBE COLUMN}} 
> lists. 
> In the case of a table generated with {{HiveContext.read.json()}}, the output 
> of {{DESCRIBE dimension_components}} is:
> {code}
> comp_config   
> struct<adText:string,adTextLeft:string,background:string,brand:string,button_color:string,cta_side:string,cta_type:string,depth:string,fixed_under:string,light:string,mid_text:string,oneline:string,overhang:string,shine:string,style:string,style_secondary:string,style_small:string,type:string>
> comp_criteria string
> comp_data_model       string
> comp_dimensions       
> struct<data:string,integrations:array<string>,template:string,variation:bigint>
> comp_disabled boolean
> comp_id       bigint
> comp_path     string
> comp_placementData    struct<mod:string>
> comp_slot_types       array<string>
> {code}
> However, {{alter table dimension_components change comp_dimensions 
> comp_dimensions 
> struct<data:string,integrations:array<string>,template:string,variation:bigint,z:string>;}}
>  fails with:
> {code}
> 15/08/08 23:13:07 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Invalid column reference 
> comp_dimensions
>       at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3584)
>       at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:312)
>       at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>       at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>       at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
>       at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
>       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
>       at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
> ...
> {code}
> Meanwhile, {{SHOW COLUMNS in dimension_components}} lists two columns: 
> {{col}} (which does not exist in the table) and {{z}}, which was just added.
> This suggests that DDL operations in Spark SQL use table metadata 
> inconsistently.
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/636a25d6074a03aafa67].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to