[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

Zhihong Yu (Commented) (JIRA) Thu, 12 Jan 2012 09:44:10 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185095#comment-13185095
 ]


Zhihong Yu commented on HBASE-5155:
-----------------------------------

In AssignmentManager.java, setEnabledTable():
{code}
+      LOG.error("Unable to ensure that the table will be"
+          + " enabled because of a ZooKeeper issue");
{code}
Please include tableName in the log.

In bulkAssignUserRegions():
{code}
+    List<HRegionInfo> regionsList = java.util.Arrays.asList(regions);
+    for (HRegionInfo regionInfo : regionsList) {
{code}
Can we directly iterate over regions array ?

In ZKTable.java:
{code}
-      if (!isEnabledOrDisablingTable(tableName)) {
+      if (isEnabledOrDisablingTable(tableName)) {
         LOG.warn("Moving table " + tableName + " state to disabling but was " +
           "not first in enabled state: " + this.cache.get(tableName));
{code}
Why was the above change necessary ? Now the warning doesn't match the check.

I see some long line:
{code}
      TEST_UTIL.createTable(TABLENAME, FAMILYNAME);
+     
assertTrue(m.assignmentManager.getZKTable().isEnabledTable(Bytes.toString(TABLENAME)));
{code}

Overall, this patch looks very good.
Thanks for plugging a hole w.r.t. cache in ZkTable.
                
> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5155
>                 URL: https://issues.apache.org/jira/browse/HBASE-5155
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.4
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Blocker
>         Attachments: HBASE-5155_latest.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>       LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
>         "; checking daughter presence");
>       fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
>     if (isDaughterMissing(catalogTracker, daughter)) {
>       LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>       MetaEditor.addDaughter(catalogTracker, daughter, null);
>       // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>       // there then something wonky about the split -- things will keep going
>       // but could be missing references to parent region.
>       // And assign it.
>       assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
>         if (processDeadRegion(e.getKey(), e.getValue(),
>             this.services.getAssignmentManager(),
>             this.server.getCatalogTracker())) {
>           this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

Reply via email to