Fully qualified path names in distributed log splitting.

lars hofhansl Mon, 04 Feb 2013 23:33:03 -0800

We just found ourselves in an interesting pickle.

We were upgrading one of our clusters from HBase 0.94.0 on Hadoop 1.0.4 to 
HBase 0.94.4 on top of Hadoop 2.
The cluster has been setup a while ago and the old shutdown script had a bug 
and shutdown HBase and HDFS uncleanly.

Assuming that the log will be replayed we upgraded Hadoop to 2.0.x, and
verified that from a file system view everything is OK.
The new HDFS runs with an HA NameNode, so the FS changed from hdfs://<old host
name> to hdfs://<ha cluster name>

Then we brought up HBase and found it stuck in splitting logs forever.
In the log we see messages like these:
2013-02-05 06:22:31,045 ERROR
org.apache.hadoop.hbase.regionserver.SplitLogWorker: unexpected error
java.lang.IllegalArgumentException:
Wrong FS:
hdfs://<old NN host>/.logs/<rs host>,60020,1358540589323-splitting/<rs
host>%2C60020%2C1358540589323.1359962644861,
expected: hdfs://<ha cluster name>
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:547)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:169)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:783)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:111)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:264)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:195)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:163)
at java.lang.Thread.run(Thread.java:662)

So it looks like distributed log splitting stores the full HDFS path name
including the host, which seems unnecessary.
This path is stored in ZK.

So all in all it seems that only can happen if all the following is true:
unclean shutdown, keeping the same ZK ensemble, changed FS.

The data is not important, we can just blow it away, but we want to prove that
we could recover the data if we had to.
It seems we have three options:

1. Blow away the data in ZK under "splitlog", and restart HBase. It should
restart the split process with the correct pathnames.

2. Temporarily change the config for the region server to set the root dir to
hdfs://<old NN host>, bounce HBase. The log splitting should now be able to
succeed.
3. Downgrade back to the old Hadoop (we kept a copy of the image).

We're trying option #2, to see whether that would fix it. #1 should work too.

Has anybody else experienced this?
It seems that would also limit our ability to take a snapshot of a filesystem
and move it to somewhere else, as the hostnames are hardcoded, at least in ZK
for log splitting.

-- Lars

Fully qualified path names in distributed log splitting.

Reply via email to