[ 
https://issues.apache.org/jira/browse/HBASE-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207785#comment-13207785
 ] 

ramkrishna.s.vasudevan commented on HBASE-5200:
-----------------------------------------------

@Stack
First of all thanks for the testcase.
{code}
Would the fb trick of NOT processing callbacks during master failover help 
here? At least for the scope of the AM.joinCluster?
{code}
This part i did not go through as i did  not find time.
{code}
. Isn't possible that in processRegionInTransition we may have done this 
already? 
{code}
The check in handleRegion or in processRegionInTransition will be exclusive.  
It will be done only in one place.
{code}
It seems wrong that we are putting stuff into RIT in two places; in 
processRegionsInTransition and in handlRegion if we happen to be fielding a 
call back before failover has had a chance to run.
{code}
Though we do this in two places either procesRIIT or handleREgion only will 
execute thus the RIT population is neeeded to help process the current flow.
{code}
 applied to TRUNK.

However, TestAssignmentManager#testBalanceOnMasterFailover fails with or 
without the patch.

{code}

The test case had few problems.
-> The region was not transitioned after the CLOSED transition got a call back 
for assigning it.  So there was no RS to process the assign.
-> the gate variable was not getting reset.
-> One more thing is we will get a call back only after we do the 
ZKAssign.getDataandWatch.
But in testcase we were getting a call back just after am.joinCluster.  So i 
have done some modifications.  

Once again thanks for the test case which helped to verify the scenarios.  
Please provide your suggestions.  
The FB approach i need some time if we have to check that and implement here.
                
> AM.ProcessRegionInTransition() and AM.handleRegion() race thus leaving the 
> region assignment inconsistent
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5200
>                 URL: https://issues.apache.org/jira/browse/HBASE-5200
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.5
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.94.0, 0.90.7, 0.92.1
>
>         Attachments: 5200-test.txt, 5200-v2.txt, HBASE-5200.patch, 
> HBASE-5200_1.patch, HBASE-5200_trunk_latest_with_test_2.patch, 
> TEST-org.apache.hadoop.hbase.master.TestRestartCluster.xml, 
> hbase-5200_90_latest.patch
>
>
> This is the scenario
> Consider a case where the balancer is going on thus trying to close regions 
> in a RS.
> Before we could close a master switch happens.  
> On Master switch the set of nodes that are in RIT is collected and we first 
> get Data and start watching the node
> After that the node data is added into RIT.
> Now by this time (before adding to RIT) if the RS to which close was called 
> does a transition in AM.handleRegion() we miss the handling saying RIT state 
> was null.
> {code}
> 2012-01-13 10:50:46,358 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
> a66d281d231dfcaea97c270698b26b6f from server 
> HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
> not in expected PENDING_CLOSE or CLOSING states
> 2012-01-13 10:50:46,358 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
> c12e53bfd48ddc5eec507d66821c4d23 from server 
> HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
> not in expected PENDING_CLOSE or CLOSING states
> 2012-01-13 10:50:46,358 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
> 59ae13de8c1eb325a0dd51f4902d2052 from server 
> HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
> not in expected PENDING_CLOSE or CLOSING states
> 2012-01-13 10:50:46,359 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
> f45bc9614d7575f35244849af85aa078 from server 
> HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
> not in expected PENDING_CLOSE or CLOSING states
> 2012-01-13 10:50:46,359 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
> cc3ecd7054fe6cd4a1159ed92fd62641 from server 
> HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
> not in expected PENDING_CLOSE or CLOSING states
> 2012-01-13 10:50:46,359 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
> 3af40478a17fee96b4a192b22c90d5a2 from server 
> HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
> not in expected PENDING_CLOSE or CLOSING states
> 2012-01-13 10:50:46,359 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
> e6096a8466e730463e10d3d61f809b92 from server 
> HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
> not in expected PENDING_CLOSE or CLOSING states
> 2012-01-13 10:50:46,359 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
> 4806781a1a23066f7baed22b4d237e24 from server 
> HOST-192-168-47-204,20020,1326342744518 but region was in  the state null and 
> not in expected PENDING_CLOSE or CLOSING states
> 2012-01-13 10:50:46,359 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Received CLOSED for region 
> d69e104131accaefe21dcc01fddc7629 from server 
> HOST-192-168-47-205,20020,1326363111288 but region was in  the state null and 
> not in expected PENDING_CLOSE or CLOSING states
> {code}
> In branch the CLOSING node is created by RS thus leading to more 
> inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to