[ 
https://issues.apache.org/jira/browse/HBASE-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348194#comment-14348194
 ] 

zhangduo commented on HBASE-13150:
----------------------------------

OK, is this a possible race condition?

When moving a region, we will call invokeAssign at the end of onRegionClosed. 
This will schedule a async assign task.

There are lots of places where we check for table disable, but we can pass all 
the check and arrive here
{code:title=AssignmentManager.java}
  /**
   * Caller must hold lock on the passed <code>state</code> object.
   * @param state
   * @param forceNewPlan
   */
  private void assign(RegionState state, boolean forceNewPlan) {
        ...
        // In case of assignment from EnableTableHandler table state is 
ENABLING. Any how
        // EnableTableHandler will set ENABLED after assigning all the table 
regions. If we
        // try to set to ENABLED directly then client API may think table is 
enabled.
        // When we have a case such as all the regions are added directly into 
hbase:meta and we call
        // assignRegion then we need to make the table ENABLED. Hence in such 
case the table
        // will not be in ENABLING or ENABLED state.
        TableName tableName = region.getTable();
        if (!tableStateManager.isTableState(tableName,
          TableState.State.ENABLED, TableState.State.ENABLING)) {
          LOG.debug("Setting table " + tableName + " to ENABLED state.");
          setEnabledTable(tableName);
        }
        ...
  }
{code}
Notice that here we only hold the region lock, not table lock. So let's stop 
this thread, and start the DisableTableHandler Thread. In prepare method, it 
will set table state to DISABLING under the protect of table lock. And in 
handleDisableTable, we will set it DISABLING one more time(without any lock)
{code:title=DisableTableHandler.java}
  public DisableTableHandler prepare()
      throws TableNotFoundException, TableNotEnabledException, IOException {
      ...
      // There could be multiple client requests trying to disable or enable
      // the table at the same time. Ensure only the first request is honored
      // After that, no other requests can be accepted until the table reaches
      // DISABLED or ENABLED.
      //TODO: reevaluate this since we have table locks now
      if (!skipTableStateCheck) {
        if 
(!this.assignmentManager.getTableStateManager().setTableStateIfInStates(
          this.tableName, TableState.State.DISABLING,
          TableState.State.ENABLED)) {
          LOG.info("Table " + tableName + " isn't enabled; skipping disable");
          throw new TableNotEnabledException(this.tableName);
        }
      }
      ...
  }

  private void handleDisableTable() throws IOException {
    // Set table disabling flag up in zk.
    this.assignmentManager.getTableStateManager().setTableState(this.tableName,
      TableState.State.DISABLING);
    ..
  }
{code}
And after this, the DisableTableHandler will enter a 'while(true)' loop and 
break until all regions are offline. Now let us stop the DisableTableHandler 
thread. Since there is no lock protection so the assign thread is free to wake 
up. So it will do the check and find that the table is not in ENABLED or 
ENABLING state so it will set it to ENABLED. So, the DisableTableHandler will 
wait forever.

Here is the log
{noformat}
2015-03-03 22:17:04,754 DEBUG [RS_CLOSE_REGION-asf900:44226-2] 
handler.CloseRegionHandler(122): Closed 
testRegionTransitionOperations,yyy,1425421019650.e2998ed5019dc219cc3c0112541c0cdf.
2015-03-03 22:17:04,758 INFO  [MASTER_TABLE_OPERATIONS-asf900:35522-0] 
handler.DisableTableHandler(123): Attempting to disable table 
testRegionTransitionOperations
2015-03-03 22:17:04,758 DEBUG [AM.-pool2-t13] master.AssignmentManager(1203): 
Found an existing plan for 
testRegionTransitionOperations,yyy,1425421019650.e2998ed5019dc219cc3c0112541c0cdf.
 destination server is asf900.gq1.ygridcore.net,45988,1425421016996 accepted as 
a dest server = true
2015-03-03 22:17:04,759 DEBUG [AM.-pool2-t13] master.AssignmentManager(1243): 
Using pre-existing plan for 
testRegionTransitionOperations,yyy,1425421019650.e2998ed5019dc219cc3c0112541c0cdf.;
 
2015-03-03 22:17:04,764 DEBUG [AM.-pool2-t13] master.AssignmentManager(1020): 
Setting table testRegionTransitionOperations to ENABLED state.
2015-03-03 22:17:04,765 INFO  [MASTER_TABLE_OPERATIONS-asf900:35522-0] 
hbase.MetaTableAccessor(1430): Updated table testRegionTransitionOperations 
state to DISABLING in META
2015-03-03 22:17:04,765 INFO  [MASTER_TABLE_OPERATIONS-asf900:35522-0] 
handler.DisableTableHandler(166): Offlining 26 regions
2015-03-03 22:17:04,770 INFO  [AM.-pool2-t13] hbase.MetaTableAccessor(1430): 
Updated table testRegionTransitionOperations state to ENABLED in META
2015-03-03 22:17:04,770 INFO  [AM.-pool2-t13] master.AssignmentManager(1023): 
Assigning 
testRegionTransitionOperations,yyy,1425421019650.e2998ed5019dc219cc3c0112541c0cdf.
 to asf900.gq1.ygridcore.net,45988,1425421016996
{noformat}

> TestMasterObserver failing disable table at end of test
> -------------------------------------------------------
>
>                 Key: HBASE-13150
>                 URL: https://issues.apache.org/jira/browse/HBASE-13150
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: stack
>            Assignee: stack
>
> I see in 
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-TRUNK/6202/testReport/junit/org.apache.hadoop.hbase.coprocessor/TestMasterObserver/testRegionTransitionOperations/
>   , now we have added in timeouts, that we are failing to disable a table. It 
> looks like table is disabled but regions are being opened on the disabled 
> table still, like HBASE-6537
> Let me see if can figure why this happening. Will be back.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to