-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17678/
-----------------------------------------------------------

(Updated Feb. 3, 2014, 11:10 p.m.)


Review request for hive.


Bugs: HIVE-4996
    https://issues.apache.org/jira/browse/HIVE-4996


Repository: hive-git


Description (updated)
-------

Background:
First issue:
There are two levels of retrying in case of transient JDO/CP/DB errors: the 
RetryingHMSHandler and RetryingRawStore.  But the RetryingRawStore is flawed in 
the case of a nested transaction of a larger RetryingHMSHandler transaction 
(which is majority of cases).
Consider the following sample RetryingHMSHandler call, where variable "ms" is a 
RetryingRawStore.
   HMSHandler.createTable()
      ms.open()  //openTx = 1
      ms.getTable() // openTx = 2, then openTx = 1 upon intermediate commit
      ms.createTable()  //openTx = 2, then openTx = 1 upon intermediate commit
      ms.commit();  //openTx = 0

If there is any transient error in any intermediate operation and 
RetryingRawStore tries again, there will always be an unbalanced transaction, 
like:
  HMSHandler.createTable()
      ms.open()  //openTx = 1
      ms.getTable() // openTx = 2, transient error, then openTx=0 upon 
rollback.  After a retry, openTx=1, then openTx=0 upon successful intermediate 
commit 
      ms.createTable()  //openTx = 1, then openTx = 0 upon intermediate commit
      ms.commit();  //unbalanced transaction!

Retrying RawStore operations doesn't make sense in nested transaction cases, as 
the first part of the transaction is rolled-back upon transient error, and 
retry logic only saves a second half which may not make sense without the 
first.  It makes much more sense to retry the entire transaction from the top, 
which is what RetryingHMSHandler would already be doing if the RetryingRawStore 
did not interfere.

Second issue:
The recent upgrade to BoneCP 0.8.0 seemed to cause more transient errors that 
triggered this problem.  In these cases, in-use connections are finalized, as 
follows:
WARN  bonecp.ConnectionPartition 
(ConnectionPartition.java:finalizeReferent(162)) - BoneCP detected an unclosed 
connection and will now attempt to close it for you. You should be closing this 
connection in your application - enable connectionWatch for additional 
debugging assistance or set disableConnectionTracking to true to disable this 
feature entirely.
The retry of this operation seems to get a good connection and allow the 
operation to proceed.  Reading forums, it seems some others have hit this issue 
after the upgrade, and switching back to 0.7.1 in our environment eliminated 
this issue for us.  But that reversion is outside the scope of this JIRA, and 
would be better-done in either the original or follow-up JIRA that upgraded the 
version.  

This fix targets the first issue only, as anyway it is needed for any sort of 
transient error, not just the BoneCP one that I observed.

Changes:
1. Removes RetryingRawStore in favor of RetryingHMSHandler, and removes the 
configuration property of the former.
2. Addresses the resultant holes in retry, in particular in the 
RetryingHMSHandler's construction of RawStore (before, RetryingRawStore would 
have retried failures like in creating the defaultDB).  It didn't seem 
necessary to increase the default RetryingHMSHandler retries to 2 to 
compensate, but I am open to that as well.
3. Contribute the instrumentation code that helped me to find the issue.  This 
includes printing missing stacks of exceptions that triggered retry, including 
'unbalanced calls' errors to hive log, and adding debug-level tracing of 
ObjectStore calls to give better correlation with other errors/warnings in hive 
log.


Diffs
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 22bb22d 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
 d7854fe 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestRawStoreTxn.java
 0b87077 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
2d8e483 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 0715e22 
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 
fb70589 
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingRawStore.java 
dcf97ec 

Diff: https://reviews.apache.org/r/17678/diff/


Testing
-------


Thanks,

Szehon Ho

Reply via email to