[ 
https://issues.apache.org/jira/browse/IMPALA-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-14293:
---------------------------------
    Description: 
In Apache Hive, when a user changes the table property 'EXTERNAL' to false, 
Hive Server2 sets the table type of the table to be altered to 
'TableType.MANAGED_TABLE', whereas Impala's catalog server sets it to 
'TableType.EXTERNAL_TABLE'.

 

This difference affects how Hive Metastore (HMS) processes such a request to 
alter the table property of 'EXTERNAL'.

 

Specifically, when HMS was started with the Hive configuration 
'hive.strict.managed.tables' set to true, upon receiving a request to alter the 
table property of 'EXTERNAL' to false, in the case when the table type is set 
to 'TableType.MANAGED_TABLE', HMS would disallow such a request if the table 
property of 'transactional' is not true. However, if the table type is set to 
'TableType.EXTERNAL_TABLE' like what Impala's catalog server does nowadays, HMS 
would allow such a request to be executed.

 

If a user is allowed to set the table property of 'EXTERNAL' to false, then 
later on, when a user issues a request to drop the table, the underlying table 
directory would be deleted, causing data loss after HMS determines whether the 
table data should be deleted based on the following method in 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java.]
 Note that for a drop table request, 'deleteData' would be true, and therefore, 
when the table property  'EXTERNAL' is false, this method would return the 
value of 'deleteData', which is true.
{code:java}
  private boolean checkTableDataShouldBeDeleted(Table tbl, boolean deleteData) {
    if (deleteData && MetaStoreUtils.isExternalTable(tbl)) {
      // External table data can be deleted if EXTERNAL_TABLE_PURGE is true
      return MetaStoreUtils.isExternalTablePurge(tbl);
    }
    return deleteData;
  } {code}
 

It's better to align catalog server's behavior with that of Hive Server2 to 
avoid the scenario of data loss as described above.

 

  was:
In Apache Hive, when a user changes the table property 'EXTERNAL' to false, 
Hive Server2 sets the table type of the table to be altered to 
'TableType.MANAGED_TABLE', whereas Impala's catalog server sets it to 
'TableType.EXTERNAL_TABLE'.

 

This difference affects how Hive Metastore (HMS) processes such a request to 
alter the table property of 'EXTERNAL'.

 

Specifically, when HMS was started with the Hive configuration 
'hive.strict.managed.tables' set to true, upon receiving a request to alter the 
table property of 'EXTERNAL' to false, in the case when the table type is set 
to 'TableType.MANAGED_TABLE', HMS would disallow such a request if the table 
property of 'transactional' is not true. However, if the table type is set to 
'TableType.EXTERNAL_TABLE' like what Impala's catalog server does nowadays, HMS 
would allow such a request to be executed.

 

If a user is allowed to set the table property of 'EXTERNAL' to false, then 
later on, when a user issues a request to drop the table, then the underlying 
table directory would be deleted, causing data loss after HMS determines 
whether the table data should be deleted based on the following method in 
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java.]
 Note that for a drop table request, 'deleteData' would be true, and therefore, 
when the table property  'EXTERNAL' is false, this method would return the 
value of 'deleteData', which is true.
{code:java}
  private boolean checkTableDataShouldBeDeleted(Table tbl, boolean deleteData) {
    if (deleteData && MetaStoreUtils.isExternalTable(tbl)) {
      // External table data can be deleted if EXTERNAL_TABLE_PURGE is true
      return MetaStoreUtils.isExternalTablePurge(tbl);
    }
    return deleteData;
  } {code}
 

It's better to align catalog server's behavior with that of Hive Server2 to 
avoid the scenario of data loss as described above.

 


> Set the table type to MANAGED_TABLE when setting table property of EXTERNAL 
> to false
> ------------------------------------------------------------------------------------
>
>                 Key: IMPALA-14293
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14293
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Fang-Yu Rao
>            Assignee: Fang-Yu Rao
>            Priority: Major
>
> In Apache Hive, when a user changes the table property 'EXTERNAL' to false, 
> Hive Server2 sets the table type of the table to be altered to 
> 'TableType.MANAGED_TABLE', whereas Impala's catalog server sets it to 
> 'TableType.EXTERNAL_TABLE'.
>  
> This difference affects how Hive Metastore (HMS) processes such a request to 
> alter the table property of 'EXTERNAL'.
>  
> Specifically, when HMS was started with the Hive configuration 
> 'hive.strict.managed.tables' set to true, upon receiving a request to alter 
> the table property of 'EXTERNAL' to false, in the case when the table type is 
> set to 'TableType.MANAGED_TABLE', HMS would disallow such a request if the 
> table property of 'transactional' is not true. However, if the table type is 
> set to 'TableType.EXTERNAL_TABLE' like what Impala's catalog server does 
> nowadays, HMS would allow such a request to be executed.
>  
> If a user is allowed to set the table property of 'EXTERNAL' to false, then 
> later on, when a user issues a request to drop the table, the underlying 
> table directory would be deleted, causing data loss after HMS determines 
> whether the table data should be deleted based on the following method in 
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java.]
>  Note that for a drop table request, 'deleteData' would be true, and 
> therefore, when the table property  'EXTERNAL' is false, this method would 
> return the value of 'deleteData', which is true.
> {code:java}
>   private boolean checkTableDataShouldBeDeleted(Table tbl, boolean 
> deleteData) {
>     if (deleteData && MetaStoreUtils.isExternalTable(tbl)) {
>       // External table data can be deleted if EXTERNAL_TABLE_PURGE is true
>       return MetaStoreUtils.isExternalTablePurge(tbl);
>     }
>     return deleteData;
>   } {code}
>  
> It's better to align catalog server's behavior with that of Hive Server2 to 
> avoid the scenario of data loss as described above.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to