2013/8/30 Todd Lipcon <t...@cloudera.com>
> If you're seeing those log messages, the SBN was already active at that > time. It only logs that message when successfully writing transactions. So, > the failover must have already completed before the logs you're looking at. > > -Todd > > On Thu, Aug 29, 2013 at 1:18 AM, Mickey <huanfeng...@gmail.com> wrote: > > > Hi, all > > I tried to test the QJM HA and it always works good. But, yestoday I met > > an quite long time fail over with QJM. The test is base on the CDH4.3.0. > > The attachment is the standby namenode and the journalnode 's logs. > > The network cable on active namenode(also a datanode) was pulled out at > > about 07:24. From the standby-namenode log I found log like this: > > 2013-08-28 07:24:51,122 INFO > > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of > transactions: 1 > > Total time for transactions(ms): 1Number of transactions batched in > Syncs: > > 0 Number of syncs: 0 SyncTimes(ms): 0 41 42 > > 2013-08-28 07:36:14,028 INFO > > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: > > 32 Total time for transactions(ms): 3Number of transactions batched in > > Syncs: 0 Number of syncs: 1 SyncTimes(ms): 9 49 46 > > > > The information seems regular. The problem is that between the 2 lines > > there's no log in 12 minutes. There is no long gc happened. It seems the > > code blocked somewhere. Unfortunately, I forgot to print the jstack info > > T_T. > > > > Hope for your response. > > > > Best regards, > > Mickey > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >