In HLog.splitLog(), the path of logfiles[i] should be displayed:
          } catch (IOException e) {
            if (length <= 0) {
              LOG.warn("Empty hlog, continuing: " +
logfiles[i]*.getPath()*+ " count=" + count, e);

I added some more to master log: http://pastebin.com/yGbQ2Cv5

I searched datanode log around the time EOFException was reported but didn't
see any abnormality.

On Sat, Sep 25, 2010 at 12:10 AM, Ted Yu <[email protected]> wrote:

> Stack:
> We use hadoop from cdh3b2.
>
> w.r.t. WAL log, I found the following from name node log:
> http://pastebin.com/35QYp21f
>
> Here is snippet from HBase master (on 10.32.56.155) log:
> http://pastebin.com/v5cFAKqt
>
> Looks like this was related to HLog splitting.
>
> w.r.t. the RC J-D put up today, I am considering two factors:
> 1. Our QA (and myself) are used to 0.20.6 - I need to convince them 0.89 is
> much better.
> 2. As you indicated in another thread, you would make a choice on whether
> to roll out a new release with master rewrite. I am waiting for the
> decision. If that release comes sooner, we may skip 0.89
>
> Cheers
>
>
> On Fri, Sep 24, 2010 at 11:03 PM, Stack <[email protected]> wrote:
>
>> On Fri, Sep 24, 2010 at 9:06 PM, Ted Yu <[email protected]> wrote:
>> > I see this log following the previous snippet:
>> >
>> > 2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Error
>> > Recovery for block null bad datanode[0] nodes == null
>> > 2010-09-24 11:21:43,799 WARN org.apache.hadoop.hdfs.DFSClient: Could not
>> get
>> > block locations. Source file
>> > "/hbase/.logs/sjc9-flash-grid02.carrieriq.com
>> ,60020,1285347585107/hlog.dat.1285351187512"
>> > - Aborting...
>> > 2010-09-24 11:21:45,417 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to close log
>> in
>> > abort
>>
>> So we were aborting and the one thing we'll try to do on our way out
>> when aborting is close the WAL log.  Seems like that failed in the
>> above.  (This stuff is odd -- 'Recovery for block null bad datanode[0]
>> nodes == null'... anything in your datanode logs to explain this?
>> What if you grep the WAL log name in namenode log, do you see anything
>> interesting?).
>>
>> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>> > org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
>> on
>> > /hbase/.logs/sjc9-flash-grid02.carrieriq.com
>> ,60020,1285347585107/hlog.dat.1285351187512
>> > File does not exist. Holder DFSClient_302121899 does not have any open
>>
>>
>> Hmm... says it does not exist.
>>
>> So, yeah, for sure, check out the namenode logs.
>>
>> Hey Ted, are you fellas running 0.20.x still?  If so, what would it
>> take to get you fellas up on 0.89, say the RC J-D put up today?
>>
>>
>> > Would failure from hlog.close() lead to data loss ?
>> >
>>
>> Are you not on 0.20 hbase still?  If so, yes.  If on 0.89 with an
>> hadoop 0.20 with append support (Apache -append branch or CDH3b2),
>> then some small amount may have been lost.
>> St.Ack
>>
>
>

Reply via email to