[
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183122#comment-13183122
]
ramkrishna.s.vasudevan commented on HBASE-5155:
-----------------------------------------------
{code}
2012-01-10 11:43:34,303 INFO org.apache.hadoop.hbase.master.ServerManager:
Received REGION_SPLIT: j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4.:
Daughters; j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7.,
j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from
linux-129,60020,1326175677339
2012-01-10 12:05:19,122 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs
for linux-129,60020,1326175677339
2012-01-10 12:06:07,153 DEBUG org.apache.hadoop.hbase.master.HMaster: Not
running balancer because processing dead regionserver(s):
[linux-129,60020,1326175677339]
2012-01-10 12:09:57,865 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 7
region(s) that linux-129,60020,1326175677339 was carrying (skipping 0
regions(s) that are already in transition)
2012-01-10 12:11:30,988 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to
disable table j9t6
2012-01-10 12:12:21,513 INFO
org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table is
done=true
2012-01-10 12:13:41,624 INFO
org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table
operation C_M_DELETE_TABLE on table j9t6
2012-01-10 12:14:00,811 DEBUG
org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region
j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4. from META and FS
2012-01-10 12:14:02,230 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Deleted region j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4. from META
2012-01-10 12:14:07,330 DEBUG
org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region
j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. from META and FS
2012-01-10 12:14:07,521 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Deleted region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. from META
2012-01-10 12:14:09,860 DEBUG
org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region
j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from META
and FS
2012-01-10 12:14:10,096 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Deleted region
j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from META
2012-01-10 12:18:11,081 DEBUG
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Offlined and
split region j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4.; checking
daughter presence
2012-01-10 12:18:46,450 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing
daughter j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7.
2012-01-10 12:18:46,775 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added
daughter j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. in region
.META.,,1, serverInfo=null
2012-01-10 12:18:47,135 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x134c5dbd0a60000 Creating (or updating) unassigned node for
49c3665a4bc656f3f6473659b64798f7 with OFFLINE state
2012-01-10 12:18:47,142 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
No previous transition plan was found (or we are ignoring an existing plan) for
j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. so generated a random
one; hri=j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7., src=,
dest=linux146,60020,1326169560093; 1 (online=1, exclude=null) available servers
2012-01-10 12:18:47,143 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Assigning region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. to
linux146,60020,1326169560093
2012-01-10 12:18:47,155 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handle region called from node nodeDataChanged
2012-01-10 12:18:47,155 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093,
region=49c3665a4bc656f3f6473659b64798f7
2012-01-10 12:18:47,202 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handle region called from node nodeDataChanged
2012-01-10 12:18:47,202 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093,
region=49c3665a4bc656f3f6473659b64798f7
2012-01-10 12:18:47,221 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handle region called from node nodeDataChanged
2012-01-10 12:18:47,221 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_OPENED, server=linux146,60020,1326169560093,
region=49c3665a4bc656f3f6473659b64798f7
2012-01-10 12:18:47,222 DEBUG
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED
event for j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. from
serverName=linux146,60020,1326169560093, load=(requests=0, regions=7,
usedHeap=30, maxHeap=996); deleting unassigned node
2012-01-10 12:18:47,222 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x134c5dbd0a60000 Deleting existing unassigned node for
49c3665a4bc656f3f6473659b64798f7 that is in expected state RS_ZK_REGION_OPENED
2012-01-10 12:18:47,230 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x134c5dbd0a60000 Successfully deleted unassigned node for region
49c3665a4bc656f3f6473659b64798f7 in expected state RS_ZK_REGION_OPENED
2012-01-10 12:18:47,232 DEBUG
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has
opened the region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. that
was online on serverName=linux146,60020,1326169560093, load=(requests=0,
regions=7, usedHeap=30, maxHeap=996)
2012-01-10 12:19:01,801 INFO
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing
daughter j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc.
2012-01-10 12:19:02,261 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added
daughter j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. in
region .META.,,1, serverInfo=null
2012-01-10 12:19:02,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x134c5dbd0a60000 Creating (or updating) unassigned node for
0b96b5ed4c0426d3b3f13e586179c9bc with OFFLINE state
2012-01-10 12:19:02,992 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
No previous transition plan was found (or we are ignoring an existing plan) for
j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. so
generated a random one;
hri=j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc., src=,
dest=linux146,60020,1326169560093; 1 (online=1, exclude=null) available servers
2012-01-10 12:19:02,992 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Assigning region
j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. to
linux146,60020,1326169560093
2012-01-10 12:19:03,062 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handle region called from node nodeDataChanged
2012-01-10 12:19:03,062 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093,
region=0b96b5ed4c0426d3b3f13e586179c9bc
2012-01-10 12:19:03,107 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handle region called from node nodeDataChanged
2012-01-10 12:19:03,108 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093,
region=0b96b5ed4c0426d3b3f13e586179c9bc
2012-01-10 12:19:03,164 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handle region called from node nodeDataChanged
2012-01-10 12:19:03,164 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_OPENED, server=linux146,60020,1326169560093,
region=0b96b5ed4c0426d3b3f13e586179c9bc
2012-01-10 12:19:03,165 DEBUG
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED
event for j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc.
from serverName=linux146,60020,1326169560093, load=(requests=11, regions=8,
usedHeap=33, maxHeap=996); deleting unassigned node
2012-01-10 12:19:03,165 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x134c5dbd0a60000 Deleting existing unassigned node for
0b96b5ed4c0426d3b3f13e586179c9bc that is in expected state RS_ZK_REGION_OPENED
2012-01-10 12:19:03,169 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x134c5dbd0a60000 Successfully deleted unassigned node for region
0b96b5ed4c0426d3b3f13e586179c9bc in expected state RS_ZK_REGION_OPENED
{code}
> ServerShutDownHandler And Disable/Delete should not happen parallely leading
> to recreation of regions that were deleted
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-5155
> URL: https://issues.apache.org/jira/browse/HBASE-5155
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.4
> Reporter: ramkrishna.s.vasudevan
> Priority: Blocker
>
> ServerShutDownHandler and disable/delete table handler races. This is not an
> issue due to TM.
> -> A regionserver goes down. In our cluster the regionserver holds lot of
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
> if (hri.isOffline() && hri.isSplit()) {
> LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
> "; checking daughter presence");
> fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1
> {code}
> if (isDaughterMissing(catalogTracker, daughter)) {
> LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
> MetaEditor.addDaughter(catalogTracker, daughter, null);
> // TODO: Log WARN if the regiondir does not exist in the fs. If its not
> // there then something wonky about the split -- things will keep going
> // but could be missing references to parent region.
> // And assign it.
> assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.
> Now after this we again start with the below code.
> {code}
> if (processDeadRegion(e.getKey(), e.getValue(),
> this.services.getAssignmentManager(),
> this.server.getCatalogTracker())) {
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which
> i think is more critical.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira