[jira] [Commented] (HBASE-4395) EnableTableHandler races with itself

Jean-Daniel Cryans (JIRA) Wed, 14 Sep 2011 11:09:31 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104735#comment-13104735
 ]


Jean-Daniel Cryans commented on HBASE-4395:
-------------------------------------------

bq. I think regions.size() should increase in the above loop. So I don't 
understand the condition for if above.

Yeah that was a last minute change, I actually tested with "regions.size() > 
lastNumberOfRegions" and then thought that that number was going down, I was 
confused with regionsToAssign().

bq. Also, remaining is calculated lastly. I don't know why remaining is updated 
in the if block.

Derp sorry it should be the timeout that's incremented.

> EnableTableHandler races with itself
> ------------------------------------
>
>                 Key: HBASE-4395
>                 URL: https://issues.apache.org/jira/browse/HBASE-4395
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.90.5
>
>         Attachments: HBASE-4395-0.90.patch
>
>
> Very often when we try to enable a big table we get something like:
> {quote}
> 2011-09-02 12:21:56,619 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unexpected state trying to OFFLINE; huge_ass_region_name state=PENDING_OPEN, 
> ts=1314991316616
> java.lang.IllegalStateException
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1074)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1030)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:858)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:838)
>         at 
> org.apache.hadoop.hbase.master.handler.EnableTableHandler$BulkEnabler$1.run(EnableTableHandler.java:154)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2011-09-02 12:21:56,620 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> {quote}
> The issue is that EnableTableHandler calls multiple BulkEnabler and it's 
> possible that by the time it calls it a second time, using a stale list of 
> still-not-enabled regions, that it tries to set one region offline in ZK but 
> just after its state changed. Case in point:
> {quote}
> 2011-09-02 12:21:56,616 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> huge_ass_region_name to sv4r23s16,60020,1314880035029
> 2011-09-02 12:21:56,619 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unexpected state trying to OFFLINE; huge_ass_region_name state=PENDING_OPEN, 
> ts=1314991316616
> {quote}
> Here the first line is the first assign done in the first thread, and the 
> second line is the second thread that got to process the same region around 
> the same time. 3ms difference in time. After that, the master dies, and it's 
> pretty sad when it restarts because it failovers an enabling table and it's 
> ungodly slow.
> I'm pretty sure there's a window where double assignment are possible.
> Talking with Stack, it doesn't really make sense to call multiple enables 
> since the list of regions is static (the table is disabled!). We should just 
> call it and wait. Also there's a lot of cleanup to do in EnableTableHandler 
> since it refers to disabling the table (copy pasta I guess).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4395) EnableTableHandler races with itself

Reply via email to