We just had all RegionServers die in a test cluster. All with the following 
exception.
(This is CDH4.2.1 with HBase 0.94.7 build against it)

Strangely HDFS is up and running (I can ls all directories, create files in it, 
etc. HDFS's fsck reports that all is well), yet we had the RSs die with this.
This almost looks like a race where the directories under .logs were yanked 
away while they were still in use.

I plan to investigate this further. In any event, has anybody seen this issue 
(or anything similar to this) before?
When this happened there was no load on the cluster (other than some write from 
OTSDB).

Thanks.

-- Lars

2013-05-08 16:02:41,178 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
<host>,60020,1367614452787: IOE in log roller
java.io.IOException: Exception in createWriter
        at 
org.apache.hadoop.hbase.regionserver.wal.HLogFileSystem.createWriter(HLogFileSystem.java:66)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:715)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:648)
        at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:95)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: cannot get log writer
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:771)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLogFileSystem.createWriter(HLogFileSystem.java:60)
        ... 4 more
Caused by: java.io.IOException: java.io.FileNotFoundException: Parent directory 
doesn't exist: /hbase/.logs/<host>,60020,1367614452787
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1726)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1848)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1770)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1747)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:418)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

        at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:173)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:768)
        ... 5 more

Reply via email to