[
https://issues.apache.org/jira/browse/HBASE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401172#comment-13401172
]
ramkrishna.s.vasudevan commented on HBASE-6240:
-----------------------------------------------
@JD
+1 on opening a follow up JIRA. I can commit this today. Thanks.
> Race in HCM.getMaster stalls clients
> ------------------------------------
>
> Key: HBASE-6240
> URL: https://issues.apache.org/jira/browse/HBASE-6240
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.0
> Reporter: Jean-Daniel Cryans
> Priority: Critical
> Fix For: 0.94.1
>
> Attachments: HBASE-6240.patch, HBASE-6240_1_0.94.patch
>
>
> I found this issue trying to run YCSB on 0.94, I don't think it exists on any
> other branch. I believe that this was introduced in HBASE-5058 "Allow
> HBaseAdmin to use an existing connection".
> The issue is that in HCM.getMaster it does this recipe:
> # Check if the master is null and runs (if so, return)
> # Grab a lock on masterLock
> # nullify this.master
> # try to get a new master
> The issue happens at 3, it should re-run 1 since while you're waiting on the
> lock someone else could have already fixed it for you. What happens right now
> is that the threads are all able to set the master to null before others are
> able to get out of getMaster and it's a complete mess.
> Figuring it out took me some time because it doesn't manifest itself right
> away, silent retries are done in the background. Basically the first clue was
> this:
> {noformat}
> Error doing get: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Failed after attempts=10, exceptions:
> Tue Jun 19 23:40:46 UTC 2012,
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException:
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
> closed
> Tue Jun 19 23:40:47 UTC 2012,
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException:
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
> closed
> Tue Jun 19 23:40:48 UTC 2012,
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException:
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
> closed
> Tue Jun 19 23:40:49 UTC 2012,
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException:
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
> closed
> Tue Jun 19 23:40:51 UTC 2012,
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException:
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
> closed
> Tue Jun 19 23:40:53 UTC 2012,
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException:
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
> closed
> Tue Jun 19 23:40:57 UTC 2012,
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException:
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
> closed
> Tue Jun 19 23:41:01 UTC 2012,
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException:
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
> closed
> Tue Jun 19 23:41:09 UTC 2012,
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException:
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
> closed
> Tue Jun 19 23:41:25 UTC 2012,
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException:
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
> closed
> {noformat}
> This was caused by the little dance up in HBaseAdmin where it deletes "stale"
> connections... which are not stale at all.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira