Marcelo Vanzin created SPARK-21617:
--------------------------------------

             Summary: ALTER TABLE...ADD COLUMNS creates invalid metadata in 
Hive metastore for DS tables
                 Key: SPARK-21617
                 URL: https://issues.apache.org/jira/browse/SPARK-21617
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.0
            Reporter: Marcelo Vanzin


When you have a data source table and you run a "ALTER TABLE...ADD COLUMNS" 
query, Spark will save invalid metadata to the Hive metastore.

Namely, it will overwrite the table's schema with the data frame's schema; that 
is not desired for data source tables (where the schema is stored in a table 
property instead).

Moreover, if you use a newer metastore client where 
METASTORE_DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES is on by default, you actually 
get an exception:

{noformat}
InvalidOperationException(message:The following columns have types incompatible 
with the existing columns in their respective positions :
c1)
        at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.throwExceptionIfIncompatibleColTypeChange(MetaStoreUtils.java:615)
        at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:133)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3704)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:3675)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
        at com.sun.proxy.$Proxy26.alter_table_with_environment_context(Unknown 
Source)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table_with_environmentContext(HiveMetaStoreClient.java:402)
        at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table_with_environmentContext(SessionHiveMetaStoreClient.java:309)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
        at com.sun.proxy.$Proxy27.alter_table_with_environmentContext(Unknown 
Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:601)
{noformat}

That exception is handled by Spark in an odd way (see code in 
{{HiveExternalCatalog.scala}}) which still stores invalid metadata.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to