[jira] [Commented] (HBASE-6240) Race in HCM.getMaster stalls clients

Jean-Daniel Cryans (JIRA) Mon, 25 Jun 2012 09:57:47 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400643#comment-13400643
 ]


Jean-Daniel Cryans commented on HBASE-6240:
-------------------------------------------

Ah yeah I completely overlooked that. Did you see how we get this exception? It 
looks so dirty in the code and now having it twice would look a lot worse.
                
> Race in HCM.getMaster stalls clients
> ------------------------------------
>
>                 Key: HBASE-6240
>                 URL: https://issues.apache.org/jira/browse/HBASE-6240
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.0
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.94.1
>
>         Attachments: HBASE-6240.patch, HBASE-6240_1_0.94.patch
>
>
> I found this issue trying to run YCSB on 0.94, I don't think it exists on any 
> other branch. I believe that this was introduced in HBASE-5058 "Allow 
> HBaseAdmin to use an existing connection".
> The issue is that in HCM.getMaster it does this recipe:
>  # Check if the master is null and runs (if so, return)
>  # Grab a lock on masterLock
>  # nullify this.master
>  # try to get a new master
> The issue happens at 3, it should re-run 1 since while you're waiting on the 
> lock someone else could have already fixed it for you. What happens right now 
> is that the threads are all able to set the master to null before others are 
> able to get out of getMaster and it's a complete mess.
> Figuring it out took me some time because it doesn't manifest itself right 
> away, silent retries are done in the background. Basically the first clue was 
> this:
> {noformat}
> Error doing get: org.apache.hadoop.hbase.client.RetriesExhaustedException: 
> Failed after attempts=10, exceptions:
> Tue Jun 19 23:40:46 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:40:47 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:40:48 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:40:49 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:40:51 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:40:53 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:40:57 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:41:01 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:41:09 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:41:25 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> {noformat}
> This was caused by the little dance up in HBaseAdmin where it deletes "stale" 
> connections... which are not stale at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6240) Race in HCM.getMaster stalls clients

Reply via email to