Improve error message when Master fail-over happens and ZK unassigned node 
contains stale znode(s)
--------------------------------------------------------------------------------------------------

                 Key: HBASE-5181
                 URL: https://issues.apache.org/jira/browse/HBASE-5181
             Project: HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 0.90.5, 0.92.0
            Reporter: Mubarak Seyed
            Priority: Minor


When master fail-over happens, if we have number of RITs under 
/hbase/unassigned and if we have stale znode(s) (encoded region names) under 
/hbase/unassigned, we are getting

{code}
2011-12-30 10:27:35,623 INFO org.apache.hadoop.hbase.master.HMaster: Master 
startup proceeding: master failover 
2011-12-30 10:27:36,002 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Failed-over master needs to process 1717 regions in transition 
2011-12-30 10:27:36,004 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown. 
java.lang.ArrayIndexOutOfBoundsException: -256 
at 
org.apache.hadoop.hbase.executor.RegionTransitionData.readFields(RegionTransitionData.java:148)
 
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:105) 
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) 
at 
org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
 
at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:743) 
at 
org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:262)
 
at 
org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:223)
 
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:401) 
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283)
{code}

and there is no clue on how to clean-up the stale znode(s) from unassigned 
using zkCli.sh (del /hbase/unassigned/<bad region name>). It would be good if 
we include the bad region name in IOException from 
RegionTransitionData.readFields().

{code}

@Override
  public void readFields(DataInput in) throws IOException {
    // the event type byte
    eventType = EventType.values()[in.readShort()];
    // the timestamp
    stamp = in.readLong();
    // the encoded name of the region being transitioned
    regionName = Bytes.readByteArray(in);
    // remaining fields are optional so prefixed with boolean
    // the name of the regionserver sending the data
    if (in.readBoolean()) {
      byte [] versionedBytes = Bytes.readByteArray(in);
      this.origin = ServerName.parseVersionedServerName(versionedBytes);
    }
    if (in.readBoolean()) {
      this.payload = Bytes.readByteArray(in);
    }
  }
{code}

If the code execution has survived until regionName then we can include the 
regionName in IOException with error message to clean-up the stale znode(s) 
under /hbase/unassigned.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to