[jira] [Updated] (HBASE-6240) Race in HCM.getMaster stalls clients

ramkrishna.s.vasudevan (JIRA) Sun, 24 Jun 2012 22:05:46 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ramkrishna.s.vasudevan updated HBASE-6240:
------------------------------------------

    Attachment: HBASE-6240_1_0.94.patch

Pls check the latest patch.  With JD's patch testcases related to Master 
restart's were failing, like TestMasterRestartAfterDisablingTable, 
TestSplitTransactionOnCluster.  I think we need to handle 
UndeclaredThrowableException before getting the new master.  Becuase the 
master.isRunning will throw an exception becuase the master has already 
switched.  May be the problem is more prominent in our testcase framework.  Pls 
review and provide your comments.  If it is ok, I can commit this today.
                
> Race in HCM.getMaster stalls clients
> ------------------------------------
>
>                 Key: HBASE-6240
>                 URL: https://issues.apache.org/jira/browse/HBASE-6240
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.0
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.94.1
>
>         Attachments: HBASE-6240.patch, HBASE-6240_1_0.94.patch
>
>
> I found this issue trying to run YCSB on 0.94, I don't think it exists on any 
> other branch. I believe that this was introduced in HBASE-5058 "Allow 
> HBaseAdmin to use an existing connection".
> The issue is that in HCM.getMaster it does this recipe:
>  # Check if the master is null and runs (if so, return)
>  # Grab a lock on masterLock
>  # nullify this.master
>  # try to get a new master
> The issue happens at 3, it should re-run 1 since while you're waiting on the 
> lock someone else could have already fixed it for you. What happens right now 
> is that the threads are all able to set the master to null before others are 
> able to get out of getMaster and it's a complete mess.
> Figuring it out took me some time because it doesn't manifest itself right 
> away, silent retries are done in the background. Basically the first clue was 
> this:
> {noformat}
> Error doing get: org.apache.hadoop.hbase.client.RetriesExhaustedException: 
> Failed after attempts=10, exceptions:
> Tue Jun 19 23:40:46 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:40:47 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:40:48 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:40:49 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:40:51 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:40:53 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:40:57 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:41:01 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:41:09 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> Tue Jun 19 23:41:25 UTC 2012, 
> org.apache.hadoop.hbase.client.HTable$3@571a4bd4, java.io.IOException: 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2eb0a3f5
>  closed
> {noformat}
> This was caused by the little dance up in HBaseAdmin where it deletes "stale" 
> connections... which are not stale at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6240) Race in HCM.getMaster stalls clients

Reply via email to