[ https://issues.apache.org/jira/browse/SPARK-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940763#comment-14940763 ]
Yin Huai commented on SPARK-9762: --------------------------------- [~simeons] Different versions of Hive have different kinds internal restrictions. It is not always possible to store the metadata in a hive compatible way. For example, if the metastore uses Hive 0.13, Hive will rejects the create table call of a parquet table if columns have a binary or a decimal one. So, to save the table's metadata, we have to workaround it and save it in a way that is not compatible with hive. The reason that you see two different output for describe table and show columns is that spark sql has implemented describe table but we still delegate show columns command to Hive. Because the metadata is not hive compatible, show columns command gives you a different output. We have been gradually adding more coverage on native support of different kinds of commands. If there is any specific commands that are important to your use cases, please feel free to create jiras. > ALTER TABLE cannot find column > ------------------------------ > > Key: SPARK-9762 > URL: https://issues.apache.org/jira/browse/SPARK-9762 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.1 > Environment: Ubuntu on AWS > Reporter: Simeon Simeonov > > {{ALTER TABLE tbl CHANGE}} cannot find a column that {{DESCRIBE COLUMN}} > lists. > In the case of a table generated with {{HiveContext.read.json()}}, the output > of {{DESCRIBE dimension_components}} is: > {code} > comp_config > struct<adText:string,adTextLeft:string,background:string,brand:string,button_color:string,cta_side:string,cta_type:string,depth:string,fixed_under:string,light:string,mid_text:string,oneline:string,overhang:string,shine:string,style:string,style_secondary:string,style_small:string,type:string> > comp_criteria string > comp_data_model string > comp_dimensions > struct<data:string,integrations:array<string>,template:string,variation:bigint> > comp_disabled boolean > comp_id bigint > comp_path string > comp_placementData struct<mod:string> > comp_slot_types array<string> > {code} > However, {{alter table dimension_components change comp_dimensions > comp_dimensions > struct<data:string,integrations:array<string>,template:string,variation:bigint,z:string>;}} > fails with: > {code} > 15/08/08 23:13:07 ERROR exec.DDLTask: > org.apache.hadoop.hive.ql.metadata.HiveException: Invalid column reference > comp_dimensions > at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3584) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:312) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326) > at > org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155) > at > org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326) > at > org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316) > at > org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473) > ... > {code} > Meanwhile, {{SHOW COLUMNS in dimension_components}} lists two columns: > {{col}} (which does not exist in the table) and {{z}}, which was just added. > This suggests that DDL operations in Spark SQL use table metadata > inconsistently. > Full spark-sql output > [here|https://gist.github.com/ssimeonov/636a25d6074a03aafa67]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org