> On March 15, 2017, 6:24 p.m., pengcheng xiong wrote: > > metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java > > Line 27 (original), 27 (patched) > > <https://reviews.apache.org/r/57566/diff/4/?file=1665117#file1665117line27> > > > > I have a general question to understand the problem that you would like > > to resolve. I tried on current master. If there are two tables a and b and > > I rename a to b, I can see the error but a's table and column stats are not > > affected. If i change a's location to an existing one, its column and table > > stats disappeared. This is expected as we are now using the new location. > > Could u be talk more about how to reproduce the problem? Thanks. > > Chaoyu Tang wrote: > Hi, PengCheng, thanks for reviewing the patch. The current logic to > rename a managed table is as following: > 1. make changes to table metadata (e.g. table name, data location, and > partition data location if it has partitions) and column stats (since > TAB_COL_STAT table has the table_name feild), then commit the transaction. > 2. if transaction commit succeeds, alter table then moves data to the > renamed table location, otherwise the transaction is rolled back, therefore > no changes happen in backend DB. > {code} > if (!success) { > msdb.rollbackTransaction(); > } > > if (success && moveData) { > // change the file name in hdfs > // check that src exists otherwise there is no need to copy the > data > // rename the src to destination > try { > if (srcFs.exists(srcPath) && !srcFs.rename(srcPath, destPath)) { > throw new IOException("Renaming " + srcPath + " to " + > destPath + " failed"); > } > } > {code} > 3. if move data fails, for example, srcFs.exists(srcPath) && > !srcFs.rename(srcPath, destPath) throws IOException for encryption zone > incompatibility issue etc, the operation starts another transaction to revert > the DB changes which have been committed in step 1. > {code} > msdb.openTransaction(); > msdb.alterTable(newt.getDbName(), newt.getTableName(), oldt); > for (ObjectPair<Partition, String> pair : altps) { > Partition part = pair.getFirst(); > part.getSd().setLocation(pair.getSecond()); > msdb.alterPartition(newt.getDbName(), name, > part.getValues(), part); > } > revertMetaDataTransaction = msdb.commitTransaction(); > {code} > But you might notice that the line of code > msdb.alterTable(newt.getDbName(), newt.getTableName(), oldt) just reverts the > Hive table metadata (e.g. table name in TBLS, location in SD), but not the > table name in TAB_COL_STATS. It causes two consequences: a) you could no more > get the table col stats because the entries in TAB_COL_STATS now have a > different table name, the one you used in the failed rename; b) you even can > not drop this Hive table, because these entries in TAB_COL_STATS can not > deleted due to mismatched table name but they still hold a reference (TBL_ID) > to this Hive table in TBLS. > > How to fix this issue: > We might add the logic to revert the change to TAB_COL_STATS in step 3. > But it is not an ideal approach since the reverting could fail as well. In > additon, I think the logic alter table is a little convulted. A simple way > might be a). first move data for the altered table; b). if moving succeeds, > commit the transaction for DB changes (e.g. table metadata and column stats), > otherwise roll back the transaction. c) if the transaction is rolled back but > data has been moved, moving the data back. Reverting data location is easier > than reverting metadata changes in DB given so many backend tables are > involved. > > It might be easier to reproduce the issues using encryption_move_tbl.q > which can be run using command "mvn test -Dtest=TestEncryptedHDFSCliDriver > -Dqfile=encryption_move_tbl.q". You need modify the encryption_move_tbl.q to > compute the columns stats and describe these stats for table encrypted_table: > {code}INSERT OVERWRITE TABLE encrypted_table SELECT * FROM src; > SHOW TABLES; > ANALYZE TABLE encrypted_table COMPUTE STATISTICS FOR COLUMNS; > DESCRIBE FORMATTED encrypted_table key; > DESCRIBE FORMATTED encrypted_table value; > > -- should fail, since they are in different encryption zones, but table > columns statistics should not change > ALTER TABLE default.encrypted_table RENAME TO > encrypted_db.encrypted_table_2; > SHOW TABLES; > DESCRIBE FORMATTED encrypted_table key; > DESCRIBE FORMATTED encrypted_table value; > {code} > You will see that columns stats are all gone after the failed alter table > due to EZ incompatibility. You could even not drop table encrypted_table > because of the corrupted column stats in TAB_COL_STATS.
The issue happens only when the DB changes for table metadata/column stats have committed but moving data fails. The cases you tried: 1. If there are two tables a and b and I rename a to b, I can see the error but a's table and column stats are not affected. -- it might be because DB changes transaction is rolled back, in which case moving data never happens, therefore no data moving failure. 2. If i change a's location to an existing one, its column and table stats disappeared -- yes, it is expected. - Chaoyu ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57566/#review169037 ----------------------------------------------------------- On March 15, 2017, 1:58 a.m., Chaoyu Tang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/57566/ > ----------------------------------------------------------- > > (Updated March 15, 2017, 1:58 a.m.) > > > Review request for hive. > > > Bugs: HIVE-16189 > https://issues.apache.org/jira/browse/HIVE-16189 > > > Repository: hive-git > > > Description > ------- > > If the table rename does not succeed due to its failure in moving the data to > the new renamed table folder, the changes in TAB_COL_STATS are not rolled > back which leads to invalid column stats. > > This patch changes the order of metadata update and data move in alter table > rename operation, which makes it easier to roll back metadata changes when > moving data fails in rename a table. > > > Diffs > ----- > > metastore/src/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java > bae39ac > ql/src/test/queries/clientpositive/encryption_move_tbl.q 7a5de7b > ql/src/test/results/clientpositive/encrypted/encryption_move_tbl.q.out > cc363ac > > > Diff: https://reviews.apache.org/r/57566/diff/4/ > > > Testing > ------- > > > Thanks, > > Chaoyu Tang > >