[ 
https://issues.apache.org/jira/browse/SPARK-9762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940082#comment-14940082
 ] 

Yin Huai commented on SPARK-9762:
---------------------------------

[~simeons] For spark sql's data source tables, if you are adding new columns, I 
think {{hiveContext.refreshTable}} or {{sql("REFRESH TABLE ...")}} is the way 
to let it pick up new columns. Can you try it?

Basically, for self-describing data sources like json, parquet, and orc, we are 
trying to give you the latest schema automatically without requiring users to 
explicitly set the schema (sometimes you need to use refresh table method 
because we are caching metadata at driver side to reduce query planning time).

> ALTER TABLE cannot find column
> ------------------------------
>
>                 Key: SPARK-9762
>                 URL: https://issues.apache.org/jira/browse/SPARK-9762
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.1
>         Environment: Ubuntu on AWS
>            Reporter: Simeon Simeonov
>
> {{ALTER TABLE tbl CHANGE}} cannot find a column that {{DESCRIBE COLUMN}} 
> lists. 
> In the case of a table generated with {{HiveContext.read.json()}}, the output 
> of {{DESCRIBE dimension_components}} is:
> {code}
> comp_config   
> struct<adText:string,adTextLeft:string,background:string,brand:string,button_color:string,cta_side:string,cta_type:string,depth:string,fixed_under:string,light:string,mid_text:string,oneline:string,overhang:string,shine:string,style:string,style_secondary:string,style_small:string,type:string>
> comp_criteria string
> comp_data_model       string
> comp_dimensions       
> struct<data:string,integrations:array<string>,template:string,variation:bigint>
> comp_disabled boolean
> comp_id       bigint
> comp_path     string
> comp_placementData    struct<mod:string>
> comp_slot_types       array<string>
> {code}
> However, {{alter table dimension_components change comp_dimensions 
> comp_dimensions 
> struct<data:string,integrations:array<string>,template:string,variation:bigint,z:string>;}}
>  fails with:
> {code}
> 15/08/08 23:13:07 ERROR exec.DDLTask: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Invalid column reference 
> comp_dimensions
>       at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3584)
>       at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:312)
>       at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>       at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>       at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
>       at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
>       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
>       at 
> org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
>       at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
> ...
> {code}
> Meanwhile, {{SHOW COLUMNS in dimension_components}} lists two columns: 
> {{col}} (which does not exist in the table) and {{z}}, which was just added.
> This suggests that DDL operations in Spark SQL use table metadata 
> inconsistently.
> Full spark-sql output 
> [here|https://gist.github.com/ssimeonov/636a25d6074a03aafa67].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to