Re: Review Request 17678: HIVE-4996 unbalanced calls to openTransaction/commitTransaction

Xuefu Zhang Fri, 07 Feb 2014 15:40:43 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17678/#review33980
-----------------------------------------------------------




metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java
<https://reviews.apache.org/r/17678/#comment63850>

    I think it's fine if we don't retry operation (creating default db, etc.) 
when MetaStore is constructed. Thus, the logic in this file can largely remain 
unchanged.


- Xuefu Zhang


On Feb. 6, 2014, 1:54 a.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17678/
> -----------------------------------------------------------
> 
> (Updated Feb. 6, 2014, 1:54 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-4996
>     https://issues.apache.org/jira/browse/HIVE-4996
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Background:
> First issue:
> There are two levels of retrying in case of transient JDO/CP/DB errors: the 
> RetryingHMSHandler and RetryingRawStore.  But the RetryingRawStore is flawed 
> in the case of a nested transaction of a larger RetryingHMSHandler 
> transaction (which is majority of cases).
> Consider the following sample RetryingHMSHandler call, where variable "ms" is 
> a RetryingRawStore.
>    HMSHandler.createTable()
>       ms.open()  //openTx = 1
>       ms.getTable() // openTx = 2, then openTx = 1 upon intermediate commit
>       ms.createTable()  //openTx = 2, then openTx = 1 upon intermediate commit
>       ms.commit();  //openTx = 0
> 
> If there is any transient error in any intermediate operation and 
> RetryingRawStore tries again, there will always be an unbalanced transaction, 
> like:
>   HMSHandler.createTable()
>       ms.open()  //openTx = 1
>       ms.getTable() // openTx = 2, transient error, then openTx=0 upon 
> rollback.  After a retry, openTx=1, then openTx=0 upon successful 
> intermediate commit 
>       ms.createTable()  //openTx = 1, then openTx = 0 upon intermediate commit
>       ms.commit();  //unbalanced transaction!
> 
> Retrying RawStore operations doesn't make sense in nested transaction cases, 
> as the first part of the transaction is rolled-back upon transient error, and 
> retry logic only saves a second half which may not make sense without the 
> first.  It makes much more sense to retry the entire transaction from the 
> top, which is what RetryingHMSHandler would already be doing if the 
> RetryingRawStore did not interfere.
> 
> Second issue:
> The recent upgrade to BoneCP 0.8.0 seemed to cause more transient errors that 
> triggered this problem.  In these cases, in-use connections are finalized, as 
> follows:
> WARN  bonecp.ConnectionPartition 
> (ConnectionPartition.java:finalizeReferent(162)) - BoneCP detected an 
> unclosed connection and will now attempt to close it for you. You should be 
> closing this connection in your application - enable connectionWatch for 
> additional debugging assistance or set disableConnectionTracking to true to 
> disable this feature entirely.
> The retry of this operation seems to get a good connection and allow the 
> operation to proceed.  Reading forums, it seems some others have hit this 
> issue after the upgrade, and switching back to 0.7.1 in our environment 
> eliminated this issue for us.  But that reversion is outside the scope of 
> this JIRA, and would be better-done in either the original or follow-up JIRA 
> that upgraded the version.  
> 
> This fix targets the first issue only, as anyway it is needed for any sort of 
> transient error, not just the BoneCP one that I observed.
> 
> Changes:
> 1. Removes RetryingRawStore in favor of RetryingHMSHandler, and removes the 
> configuration property of the former.
> 2. Addresses the resultant holes in retry, in particular in the 
> RetryingHMSHandler's construction of RawStore (before, RetryingRawStore would 
> have retried failures like in creating the defaultDB).  It didn't seem 
> necessary to increase the default RetryingHMSHandler retries to 2 to 
> compensate, but I am open to that as well.
> 3. Contribute the instrumentation code that helped me to find the issue.  
> This includes printing missing stacks of exceptions that triggered retry, 
> including 'unbalanced calls' errors to hive log, and adding debug-level 
> tracing of ObjectStore calls to give better correlation with other 
> errors/warnings in hive log.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 89c9349 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
>  d7854fe 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestRawStoreTxn.java
>  0b87077 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
> ddc6aed 
>   metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 0715e22 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RawStoreProxy.java 
> PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 
> fb70589 
>   metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java 
> dcf97ec 
> 
> Diff: https://reviews.apache.org/r/17678/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>

Re: Review Request 17678: HIVE-4996 unbalanced calls to openTransaction/commitTransaction

Reply via email to