[
https://issues.apache.org/jira/browse/HBASE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104735#comment-13104735
]
Jean-Daniel Cryans commented on HBASE-4395:
-------------------------------------------
bq. I think regions.size() should increase in the above loop. So I don't
understand the condition for if above.
Yeah that was a last minute change, I actually tested with "regions.size() >
lastNumberOfRegions" and then thought that that number was going down, I was
confused with regionsToAssign().
bq. Also, remaining is calculated lastly. I don't know why remaining is updated
in the if block.
Derp sorry it should be the timeout that's incremented.
> EnableTableHandler races with itself
> ------------------------------------
>
> Key: HBASE-4395
> URL: https://issues.apache.org/jira/browse/HBASE-4395
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.4
> Reporter: Jean-Daniel Cryans
> Priority: Blocker
> Fix For: 0.90.5
>
> Attachments: HBASE-4395-0.90.patch
>
>
> Very often when we try to enable a big table we get something like:
> {quote}
> 2011-09-02 12:21:56,619 FATAL org.apache.hadoop.hbase.master.HMaster:
> Unexpected state trying to OFFLINE; huge_ass_region_name state=PENDING_OPEN,
> ts=1314991316616
> java.lang.IllegalStateException
> at
> org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1074)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1030)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:858)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:838)
> at
> org.apache.hadoop.hbase.master.handler.EnableTableHandler$BulkEnabler$1.run(EnableTableHandler.java:154)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2011-09-02 12:21:56,620 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> {quote}
> The issue is that EnableTableHandler calls multiple BulkEnabler and it's
> possible that by the time it calls it a second time, using a stale list of
> still-not-enabled regions, that it tries to set one region offline in ZK but
> just after its state changed. Case in point:
> {quote}
> 2011-09-02 12:21:56,616 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
> huge_ass_region_name to sv4r23s16,60020,1314880035029
> 2011-09-02 12:21:56,619 FATAL org.apache.hadoop.hbase.master.HMaster:
> Unexpected state trying to OFFLINE; huge_ass_region_name state=PENDING_OPEN,
> ts=1314991316616
> {quote}
> Here the first line is the first assign done in the first thread, and the
> second line is the second thread that got to process the same region around
> the same time. 3ms difference in time. After that, the master dies, and it's
> pretty sad when it restarts because it failovers an enabling table and it's
> ungodly slow.
> I'm pretty sure there's a window where double assignment are possible.
> Talking with Stack, it doesn't really make sense to call multiple enables
> since the list of regions is static (the table is disabled!). We should just
> call it and wait. Also there's a lot of cleanup to do in EnableTableHandler
> since it refers to disabling the table (copy pasta I guess).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira