Re: Long time fail over when using QJM

Mickey Thu, 29 Aug 2013 19:08:07 -0700

Sorry for the empty mail.

Thanks, Todd.
In the test my HBase doesn't work for a long time. Maybe there's something
wrong in my HBase.I will try to do more tests.


Thanks,
Mickey


2013/8/30 Mickey <huanfeng...@gmail.com>

>
>
>
> 2013/8/30 Todd Lipcon <t...@cloudera.com>
>
>> If you're seeing those log messages, the SBN was already active at that
>> time. It only logs that message when successfully writing transactions.
>> So,
>> the failover must have already completed before the logs you're looking
>> at.
>>
>> -Todd
>>
>> On Thu, Aug 29, 2013 at 1:18 AM, Mickey <huanfeng...@gmail.com> wrote:
>>
>> > Hi, all
>> > I tried to test the QJM HA and it always works good. But, yestoday I met
>> > an quite long time fail over with QJM. The test is base on the CDH4.3.0.
>> > The attachment is the standby namenode and the journalnode 's logs.
>> > The network cable on active namenode(also a datanode) was pulled out at
>> > about 07:24. From the standby-namenode log I found log like this:
>> > 2013-08-28 07:24:51,122 INFO
>> > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
>> transactions: 1
>> > Total time for transactions(ms): 1Number of transactions batched in
>> Syncs:
>> > 0 Number of syncs: 0 SyncTimes(ms): 0 41 42
>> > 2013-08-28 07:36:14,028 INFO
>> > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
>> transactions:
>> > 32 Total time for transactions(ms): 3Number of transactions batched in
>> > Syncs: 0 Number of syncs: 1 SyncTimes(ms): 9 49 46
>> >
>> > The information seems regular. The problem is that between the 2 lines
>> > there's no log  in 12 minutes. There is no long gc happened. It seems
>> the
>> > code blocked somewhere. Unfortunately, I forgot to print the jstack info
>> > T_T.
>> >
>> > Hope for your response.
>> >
>> > Best regards,
>> > Mickey
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>

Re: Long time fail over when using QJM

Reply via email to