[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

ramkrishna.s.vasudevan (Commented) (JIRA) Fri, 13 Jan 2012 20:29:26 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186084#comment-13186084
 ]


ramkrishna.s.vasudevan commented on HBASE-5155:
-----------------------------------------------

bq.This looks like a method used internally by AM only. Does it need to be 
public?
{code}
+  public void setEnabledTable(String tableName) {
{code}
I did not have this as public in the beginning.  But later in 
HMaster.rebuildUserRegions() i had to set enabled table.  So thought of 
exposing this from AM so that i can use it there instead of repeating the same 
code in HMaster.
bq.Have you tried it? In rolling restart we'll upgrade the master first 
usually. Won't it know how to deal w/ new zk node for ENABLED state?
If master is restarted first even then the above changes will be necessary as 
when the master builds the table state he will not find the ENABLED state in 
zk.  So the above changes in Master will help him to build that state.  Yes 
rolling restart was tested.

bq.FYI, don't do these kinda changes in future:
When applying a formatter it happened. Sure Stack i will take care of those 
changes.
{code}public boolean isEnabledTable(String tableName) {
-    synchronized (this.cache) {
-      // No entry in cache means enabled table.
-      return !this.cache.containsKey(tableName);
-    }
+    return isTableState(tableName, TableState.ENABLED);
{code}
The isTableState will anyway have a synchronized(this.cache) so it should be ok?

{code}
+   * Check if the table is in DISABLED state in cache
{code}
My idea of adding 'in cache' was like the state is checked only in Memory and 
it is not going to zk to check the state . So i thought like the 'in cache' 
word will tell the user like to ZK is used in checking it.
{code}
+    // Enable the ROOT table if on process fail over the RS containing ROOT
+    // was active.
{code}
This scenario comes when the master is restarted but the RS is still alive.  
Now the master should enable the ROOT and META also because when he comes up he 
should create the enabled node in zk.
If we don't do this step then for ROOT and META we will not have a node in zk 
in the above scenario.
But if the master explicitly assign ROOT and META then there will be a zk node. 
So to unify this i had to do the zkTable.setEnabledTable().
@Stack
Is it fine Stack? I can reprepare a patch based on your feedback and then 
upload a final one?
@Ted
You have any more comments or feedback so that i can incorporate in the next 
patch.


                
> ServerShutDownHandler And Disable/Delete should not happen parallely leading 
> to recreation of regions that were deleted
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5155
>                 URL: https://issues.apache.org/jira/browse/HBASE-5155
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.4
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5155_1.patch, HBASE-5155_latest.patch, 
> hbase-5155_6.patch
>
>
> ServerShutDownHandler and disable/delete table handler races.  This is not an 
> issue due to TM.
> -> A regionserver goes down.  In our cluster the regionserver holds lot of 
> regions.
> -> A region R1 has two daughters D1 and D2.
> -> The ServerShutdownHandler gets called and scans the META and gets all the 
> user regions
> -> Parallely a table is disabled. (No problem in this step).
> -> Delete table is done.
> -> The tables and its regions are deleted including R1, D1 and D2.. (So META 
> is cleaned)
> -> Now ServerShutdownhandler starts to processTheDeadRegion
> {code}
>  if (hri.isOffline() && hri.isSplit()) {
>       LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
>         "; checking daughter presence");
>       fixupDaughters(result, assignmentManager, catalogTracker);
> {code}
> As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
> {code}
>     if (isDaughterMissing(catalogTracker, daughter)) {
>       LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
>       MetaEditor.addDaughter(catalogTracker, daughter, null);
>       // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
>       // there then something wonky about the split -- things will keep going
>       // but could be missing references to parent region.
>       // And assign it.
>       assignmentManager.assign(daughter, true);
> {code}
> we call assign of the daughers.  
> Now after this we again start with the below code.
> {code}
>         if (processDeadRegion(e.getKey(), e.getValue(),
>             this.services.getAssignmentManager(),
>             this.server.getCatalogTracker())) {
>           this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Now when the SSH scanned the META it had R1, D1 and D2.
> So as part of the above code D1 and D2 which where assigned by fixUpDaughters
> is again assigned by 
> {code}
> this.services.getAssignmentManager().assign(e.getKey(), true);
> {code}
> Thus leading to a zookeeper issue due to bad version and killing the master.
> The important part here is the regions that were deleted are recreated which 
> i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

Reply via email to