----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17678/#review33980 -----------------------------------------------------------
metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java <https://reviews.apache.org/r/17678/#comment63850> I think it's fine if we don't retry operation (creating default db, etc.) when MetaStore is constructed. Thus, the logic in this file can largely remain unchanged. - Xuefu Zhang On Feb. 6, 2014, 1:54 a.m., Szehon Ho wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17678/ > ----------------------------------------------------------- > > (Updated Feb. 6, 2014, 1:54 a.m.) > > > Review request for hive. > > > Bugs: HIVE-4996 > https://issues.apache.org/jira/browse/HIVE-4996 > > > Repository: hive-git > > > Description > ------- > > Background: > First issue: > There are two levels of retrying in case of transient JDO/CP/DB errors: the > RetryingHMSHandler and RetryingRawStore. But the RetryingRawStore is flawed > in the case of a nested transaction of a larger RetryingHMSHandler > transaction (which is majority of cases). > Consider the following sample RetryingHMSHandler call, where variable "ms" is > a RetryingRawStore. > HMSHandler.createTable() > ms.open() //openTx = 1 > ms.getTable() // openTx = 2, then openTx = 1 upon intermediate commit > ms.createTable() //openTx = 2, then openTx = 1 upon intermediate commit > ms.commit(); //openTx = 0 > > If there is any transient error in any intermediate operation and > RetryingRawStore tries again, there will always be an unbalanced transaction, > like: > HMSHandler.createTable() > ms.open() //openTx = 1 > ms.getTable() // openTx = 2, transient error, then openTx=0 upon > rollback. After a retry, openTx=1, then openTx=0 upon successful > intermediate commit > ms.createTable() //openTx = 1, then openTx = 0 upon intermediate commit > ms.commit(); //unbalanced transaction! > > Retrying RawStore operations doesn't make sense in nested transaction cases, > as the first part of the transaction is rolled-back upon transient error, and > retry logic only saves a second half which may not make sense without the > first. It makes much more sense to retry the entire transaction from the > top, which is what RetryingHMSHandler would already be doing if the > RetryingRawStore did not interfere. > > Second issue: > The recent upgrade to BoneCP 0.8.0 seemed to cause more transient errors that > triggered this problem. In these cases, in-use connections are finalized, as > follows: > WARN bonecp.ConnectionPartition > (ConnectionPartition.java:finalizeReferent(162)) - BoneCP detected an > unclosed connection and will now attempt to close it for you. You should be > closing this connection in your application - enable connectionWatch for > additional debugging assistance or set disableConnectionTracking to true to > disable this feature entirely. > The retry of this operation seems to get a good connection and allow the > operation to proceed. Reading forums, it seems some others have hit this > issue after the upgrade, and switching back to 0.7.1 in our environment > eliminated this issue for us. But that reversion is outside the scope of > this JIRA, and would be better-done in either the original or follow-up JIRA > that upgraded the version. > > This fix targets the first issue only, as anyway it is needed for any sort of > transient error, not just the BoneCP one that I observed. > > Changes: > 1. Removes RetryingRawStore in favor of RetryingHMSHandler, and removes the > configuration property of the former. > 2. Addresses the resultant holes in retry, in particular in the > RetryingHMSHandler's construction of RawStore (before, RetryingRawStore would > have retried failures like in creating the defaultDB). It didn't seem > necessary to increase the default RetryingHMSHandler retries to 2 to > compensate, but I am open to that as well. > 3. Contribute the instrumentation code that helped me to find the issue. > This includes printing missing stacks of exceptions that triggered retry, > including 'unbalanced calls' errors to hive log, and adding debug-level > tracing of ObjectStore calls to give better correlation with other > errors/warnings in hive log. > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 89c9349 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java > d7854fe > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestRawStoreTxn.java > 0b87077 > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > ddc6aed > metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java > 0715e22 > metastore/src/java/org/apache/hadoop/hive/metastore/RawStoreProxy.java > PRE-CREATION > metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java > fb70589 > metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java > dcf97ec > > Diff: https://reviews.apache.org/r/17678/diff/ > > > Testing > ------- > > > Thanks, > > Szehon Ho > >