[ https://issues.apache.org/jira/browse/HBASE-5181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184551#comment-13184551 ]
Zhihong Yu commented on HBASE-5181: ----------------------------------- The message is certainly detailed :-) Please remember to replace '/hbase' with the value of zookeeper.znode.parent > Improve error message when Master fail-over happens and ZK unassigned node > contains stale znode(s) > -------------------------------------------------------------------------------------------------- > > Key: HBASE-5181 > URL: https://issues.apache.org/jira/browse/HBASE-5181 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.92.0, 0.90.5 > Reporter: Mubarak Seyed > Assignee: Mubarak Seyed > Priority: Minor > Labels: noob > > When master fail-over happens, if we have number of RITs under > /hbase/unassigned and if we have stale znode(s) (encoded region names) under > /hbase/unassigned, we are getting > {code} > 2011-12-30 10:27:35,623 INFO org.apache.hadoop.hbase.master.HMaster: Master > startup proceeding: master failover > 2011-12-30 10:27:36,002 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to > process 1717 regions in transition > 2011-12-30 10:27:36,004 FATAL org.apache.hadoop.hbase.master.HMaster: > Unhandled exception. Starting shutdown. > java.lang.ArrayIndexOutOfBoundsException: -256 > at > org.apache.hadoop.hbase.executor.RegionTransitionData.readFields(RegionTransitionData.java:148) > > at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:105) > at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) > at > org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198) > > at org.apache.hadoop.hbase.zookeeper.ZKAssign.getData(ZKAssign.java:743) > at > org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:262) > > at > org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:223) > > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:401) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:283) > {code} > and there is no clue on how to clean-up the stale znode(s) from unassigned > using zkCli.sh (del /hbase/unassigned/<bad region name>). It would be good if > we include the bad region name in IOException from > RegionTransitionData.readFields(). > {code} > @Override > public void readFields(DataInput in) throws IOException { > // the event type byte > eventType = EventType.values()[in.readShort()]; > // the timestamp > stamp = in.readLong(); > // the encoded name of the region being transitioned > regionName = Bytes.readByteArray(in); > // remaining fields are optional so prefixed with boolean > // the name of the regionserver sending the data > if (in.readBoolean()) { > byte [] versionedBytes = Bytes.readByteArray(in); > this.origin = ServerName.parseVersionedServerName(versionedBytes); > } > if (in.readBoolean()) { > this.payload = Bytes.readByteArray(in); > } > } > {code} > If the code execution has survived until regionName then we can include the > regionName in IOException with error message to clean-up the stale znode(s) > under /hbase/unassigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira