[
https://issues.apache.org/jira/browse/HBASE-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550900#comment-13550900
]
Anoop Sam John commented on HBASE-7034:
---------------------------------------
Is this code came in by mistake?
{code}
RecoverableZooKeeper#setData(String path, byte[] data, int version){
....
byte[] revData = zk.getData(path, false, stat);
int idLength = Bytes.toInt(revData, ID_LENGTH_SIZE);
int dataLength = revData.length-ID_LENGTH_SIZE-idLength;
int dataOffset = ID_LENGTH_SIZE+idLength;
if(Bytes.compareTo(revData, ID_LENGTH_SIZE, id.length,
revData, dataOffset, dataLength) == 0) {
// the bad version is caused by previous successful setData
return stat;
}
}
{code}
When we write the data to zk, we write an identifier for the process. Here in
order to check whether the BADVERSION exception from zookeeper is due to a
previous setData (from the same process), we need to compare the id read from
the zookeeper and the id for this process (this.id).. Or am I missing some
thing. The above offset and length calculating math and compare looks
problematic for me.
In that case this is the issue for this bug I guess.
>From the log it is clear that there is no problem wrt the node and version at
>1st. [As part of the transition of state from OPENING to OPENED 1st the
>present data is read and the check below tells the data and its version every
>thing is fine.] Immediately a connection loss happened. This triggers a retry
>for the setData. May be the previous operation made the data change in
>zookeeper and master got the data changed event. (?)
I think correcting the above code may solve the problems.
> Bad version, failed OPENING to OPENED but master thinks it is open anyways
> --------------------------------------------------------------------------
>
> Key: HBASE-7034
> URL: https://issues.apache.org/jira/browse/HBASE-7034
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 0.94.2
> Reporter: stack
>
> I have this in RS log:
> {code}
> 2012-10-22 02:21:50,698 ERROR
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
> transitioning node
> b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f.
> from OPENING to OPENED -- closing region
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> BadVersion for /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f
> {code}
> Master says this (it is bulk assigning):
> {code}
> ....
> 2012-10-22 02:21:40,673 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:10302-0xb3a862e57a503ba Set watcher on existing znode
> /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f
> ...
> then this
> ....
> 2012-10-22 02:23:47,089 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:10302-0xb3a862e57a503ba Set watcher on existing znode
> /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f
> ....
> 2012-10-22 02:24:34,176 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:10302-0xb3a862e57a503ba Retrieved 112 byte(s) of data from znode
> /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f and set watcher;
> region=b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f.,
> origin=sv4r17s44,10304,1350872216778, state=RS_ZK_REGION_OPENED
> etc.
> {code}
> Disagreement as to what is going on here.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira