Hey Mickey, I have few followup questions:
For how long these threads blocked? What happens afterwards, regionserver resumes, or aborts? And, could you pastebin the logs after the above exception? Sync failure causes a log roll, which is retried based on value of hbase.regionserver.logroll.errors.tolerated Which 0.94 version you are using? Thanks, Himanshu On Mon, Sep 2, 2013 at 5:16 AM, Mickey <[email protected]> wrote: > Hi, all > > I was testing HBase with HDFS QJM HA recently. Hadoop version is CDH 4.3.0 > and HBase is based on 0.94 with some patches(include HBASE-8211) > In a test, I met a blocking issue in HBase. I killed a node which is the > active namenode, also datanode, regionserver on it. > > The HDFS fail over successfully. The master tried re-assign the regions > after detecting the regionserver down. But no region can be online. > > From the log I found all operations to .META. failed. Printing the jstack > of the region server who contains the .META. , I found info below: > "regionserver60020.logSyncer" daemon prio=10 tid=0x00007f317007e800 > nid=0x27ee5 in Object.wait() [0x00007f318add9000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at > > org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:1708) > - locked <0x00007f34ae7b3638> (a java.util.LinkedList) > at > > org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1609) > at > org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1525) > at > org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1510) > at > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116) > at > org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:1208) > at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:303) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1290) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1247) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1400) > at > org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1199) > at java.lang.Thread.run(Thread.java:662) > > The logSyncer is always waiting on waitForAckedSeqno. All the HLog > operations seems blocked. Is this a bug? Or I missed some important > patches? > > Hope to get your suggestions soon. > > Best regards, > Mickey >
