[ https://issues.apache.org/jira/browse/SPARK-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940082#comment-14940082 ]
Yin Huai commented on SPARK-9762: --------------------------------- [~simeons] For spark sql's data source tables, if you are adding new columns, I think {{hiveContext.refreshTable}} or {{sql("REFRESH TABLE ...")}} is the way to let it pick up new columns. Can you try it? Basically, for self-describing data sources like json, parquet, and orc, we are trying to give you the latest schema automatically without requiring users to explicitly set the schema (sometimes you need to use refresh table method because we are caching metadata at driver side to reduce query planning time). > ALTER TABLE cannot find column > ------------------------------ > > Key: SPARK-9762 > URL: https://issues.apache.org/jira/browse/SPARK-9762 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.1 > Environment: Ubuntu on AWS > Reporter: Simeon Simeonov > > {{ALTER TABLE tbl CHANGE}} cannot find a column that {{DESCRIBE COLUMN}} > lists. > In the case of a table generated with {{HiveContext.read.json()}}, the output > of {{DESCRIBE dimension_components}} is: > {code} > comp_config > struct<adText:string,adTextLeft:string,background:string,brand:string,button_color:string,cta_side:string,cta_type:string,depth:string,fixed_under:string,light:string,mid_text:string,oneline:string,overhang:string,shine:string,style:string,style_secondary:string,style_small:string,type:string> > comp_criteria string > comp_data_model string > comp_dimensions > struct<data:string,integrations:array<string>,template:string,variation:bigint> > comp_disabled boolean > comp_id bigint > comp_path string > comp_placementData struct<mod:string> > comp_slot_types array<string> > {code} > However, {{alter table dimension_components change comp_dimensions > comp_dimensions > struct<data:string,integrations:array<string>,template:string,variation:bigint,z:string>;}} > fails with: > {code} > 15/08/08 23:13:07 ERROR exec.DDLTask: > org.apache.hadoop.hive.ql.metadata.HiveException: Invalid column reference > comp_dimensions > at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3584) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:312) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326) > at > org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155) > at > org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326) > at > org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316) > at > org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473) > ... > {code} > Meanwhile, {{SHOW COLUMNS in dimension_components}} lists two columns: > {{col}} (which does not exist in the table) and {{z}}, which was just added. > This suggests that DDL operations in Spark SQL use table metadata > inconsistently. > Full spark-sql output > [here|https://gist.github.com/ssimeonov/636a25d6074a03aafa67]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org