[ https://issues.apache.org/jira/browse/HBASE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403278#comment-13403278 ]
Hudson commented on HBASE-6240: ------------------------------- Integrated in HBase-0.94-security #38 (See [https://builds.apache.org/job/HBase-0.94-security/38/]) HBASE-6240 Race in HCM.getMaster stalls clients Submitted by:J-D, Ram Reviewed by:J-D, Ted (Revision 1354116) Result = FAILURE ramkrishna : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java > Race in HCM.getMaster stalls clients > ------------------------------------ > > Key: HBASE-6240 > URL: https://issues.apache.org/jira/browse/HBASE-6240 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.0 > Reporter: Jean-Daniel Cryans > Assignee: ramkrishna.s.vasudevan > Priority: Critical > Fix For: 0.94.1 > > Attachments: HBASE-6240.patch, HBASE-6240_1_0.94.patch > > > I found this issue trying to run YCSB on 0.94, I don't think it exists on any > other branch. I believe that this was introduced in HBASE-5058 "Allow > HBaseAdmin to use an existing connection". > The issue is that in HCM.getMaster it does this recipe: > # Check if the master is null and runs (if so, return) > # Grab a lock on masterLock > # nullify this.master > # try to get a new master > The issue happens at 3, it should re-run 1 since while you're waiting on the > lock someone else could have already fixed it for you. What happens right now > is that the threads are all able to set the master to null before others are > able to get out of getMaster and it's a complete mess. > Figuring it out took me some time because it doesn't manifest itself right > away, silent retries are done in the background. Basically the first clue was > this: > {noformat} > Error doing get: org.apache.hadoop.hbase.client.RetriesExhaustedException: > Failed after attempts=10, exceptions: > Tue Jun 19 23:40:46 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:40:47 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:40:48 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:40:49 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:40:51 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:40:53 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:40:57 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:41:01 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:41:09 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > Tue Jun 19 23:41:25 UTC 2012, > org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5 > closed > {noformat} > This was caused by the little dance up in HBaseAdmin where it deletes "stale" > connections... which are not stale at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira