Hi all, 
    I found a situation may cause region closed forever, and this situation 
happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the 
problem:
    1, master send region open to regionserver
    2, rs open a handler do openregion
    3, rs return resopnse to master
    3, master not received the response, or timeout, send open region again
    4, rs already opened the region
    5, master processAlreadyOpenedRegion, update regionstate open in master 
memory
    6, master received zk message region opened(for some reason late, eg: net 
work), and triger update regionstate open, but find that region already opened, 
ERROR!
    7, master send close region, and region be closed forever.

may be a solution is change processAlreadyOpenedRegion in class 
AssignmentManager:

  private void processAlreadyOpenedRegion(HRegionInfo region, ServerName sn) {
    // Remove region from in-memory transition and unassigned node from ZK
    // While trying to enable the table the regions of the table were
    // already enabled.
    LOG.debug("ALREADY_OPENED " + region.getRegionNameAsString()
      + " to " + sn);
    String encodedName = region.getEncodedName();

     /**
     *  check region state in zk, if already opened, return; leave the 
regionStates work to zkStatus change to trigger.
    **/


    deleteNodeInStates(encodedName, "offline", sn, 
EventType.M_ZK_REGION_OFFLINE);
    regionStates.regionOnline(region, sn);
  }


zhou_shuaif...@sina.com

Reply via email to