Hi Raghu,

The thread you posted is my original post written when this problem first
happened on my cluster. I can file a JIRA but I wouldn't be able to provide
information other than what I already posted and I don't have the logs from
that time. Should I still file ?

Thanks,
Tamir


On Tue, May 5, 2009 at 9:14 PM, Raghu Angadi <rang...@yahoo-inc.com> wrote:

> Tamir,
>
> Please file a jira on the problem you are seeing with 'saveLeases'. In the
> past there have been multiple fixes in this area (HADOOP-3418, HADOOP-3724,
> and more mentioned in HADOOP-3724).
>
> Also refer the thread you started
> http://www.mail-archive.com/core-user@hadoop.apache.org/msg09397.html
>
> I think another user reported the same problem recently.
>
> These are indeed very serious and very annoying bugs.
>
> Raghu.
>
>
> Tamir Kamara wrote:
>
>> I didn't have a space problem which led to it (I think). The corruption
>> started after I bounced the cluster.
>> At the time, I tried to investigate what led to the corruption but didn't
>> find anything useful in the logs besides this line:
>> saveLeases found path
>>
>> /tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_000002_0/part-00002
>> but no matching entry in namespace
>>
>> I also tried to recover from the secondary name node files but the
>> corruption my too wide-spread and I had to format.
>>
>> Tamir
>>
>> On Mon, May 4, 2009 at 4:48 PM, Stas Oskin <stas.os...@gmail.com> wrote:
>>
>>  Hi.
>>>
>>> Same conditions - where the space has run out and the fs got corrupted?
>>>
>>> Or it got corrupted by itself (which is even more worrying)?
>>>
>>> Regards.
>>>
>>> 2009/5/4 Tamir Kamara <tamirkam...@gmail.com>
>>>
>>>  I had the same problem a couple of weeks ago with 0.19.1. Had to
>>>> reformat
>>>> the cluster too...
>>>>
>>>> On Mon, May 4, 2009 at 3:50 PM, Stas Oskin <stas.os...@gmail.com>
>>>> wrote:
>>>>
>>>>  Hi.
>>>>>
>>>>> After rebooting the NameNode server, I found out the NameNode doesn't
>>>>>
>>>> start
>>>>
>>>>> anymore.
>>>>>
>>>>> The logs contained this error:
>>>>> "FSNamesystem initialization failed"
>>>>>
>>>>>
>>>>> I suspected filesystem corruption, so I tried to recover from
>>>>> SecondaryNameNode. Problem is, it was completely empty!
>>>>>
>>>>> I had an issue that might have caused this - the root mount has run out
>>>>>
>>>> of
>>>>
>>>>> space. But, both the NameNode and the SecondaryNameNode directories
>>>>>
>>>> were
>>>
>>>> on
>>>>
>>>>> another mount point with plenty of space there - so it's very strange
>>>>>
>>>> that
>>>>
>>>>> they were impacted in any way.
>>>>>
>>>>> Perhaps the logs, which were located on root mount and as a result,
>>>>>
>>>> could
>>>
>>>> not be written, have caused this?
>>>>>
>>>>>
>>>>> To get back HDFS running, i had to format the HDFS (including manually
>>>>> erasing the files from DataNodes). While this reasonable in test
>>>>> environment
>>>>> - production-wise it would be very bad.
>>>>>
>>>>> Any idea why it happened, and what can be done to prevent it in the
>>>>>
>>>> future?
>>>>
>>>>> I'm using the stable 0.18.3 version of Hadoop.
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>>
>>
>

Reply via email to