Please take a histo live dump when the memory is full. Note that this causes 
full gc.
http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html

What are the number of blocks you have on the system.

Send the JVM options you are using. From earlier java versions which used 1/8 
of total heap for young gen, it has gone upto 1/3 of total heap. This could 
also be the reason. 

Do you collect gc logs? Send that as well.

Sent from a mobile device

On Dec 22, 2012, at 9:51 AM, Edward Capriolo <edlinuxg...@gmail.com> wrote:

> Newer 1.6 are getting close to 1.7 so I am not going to fear a number and
> fight the future.
> 
> I have been aat around 27 million files for a while been as high as 30
> million I do not think that is related.
> 
> I do not think it is related to checkpoints but I am considering
> raising/lowering the checkpoint triggers.
> 
> On Saturday, December 22, 2012, Joep Rottinghuis <jrottingh...@gmail.com>
> wrote:
>> Do your OOMs correlate with the secondary checkpointing?
>> 
>> Joep
>> 
>> Sent from my iPhone
>> 
>> On Dec 22, 2012, at 7:42 AM, Michael Segel <michael_se...@hotmail.com>
> wrote:
>> 
>>> Hey Silly question...
>>> 
>>> How long have you had 27 million files?
>>> 
>>> I mean can you correlate the number of files to the spat of OOMs?
>>> 
>>> Even without problems... I'd say it would be a good idea to upgrade due
> to the probability of a lot of code fixes...
>>> 
>>> If you're running anything pre 1.x, going to 1.7 java wouldn't be a good
> idea.  Having said that... outside of MapR, have any of the distros
> certified themselves on 1.7 yet?
>>> 
>>> On Dec 22, 2012, at 6:54 AM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>>> 
>>>> I will give this a go. I have actually went in JMX and manually
> triggered
>>>> GC no memory is returned. So I assumed something was leaking.
>>>> 
>>>> On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris <afa...@linkedin.com>
> wrote:
>>>> 
>>>>> I know this will sound odd, but try reducing your heap size.   We had
> an
>>>>> issue like this where GC kept falling behind and we either ran out of
> heap
>>>>> or would be in full gc.  By reducing heap, we were forcing concurrent
> mark
>>>>> sweep to occur and avoided both full GC and running out of heap space
> as
>>>>> the JVM would collect objects more frequently.
>>>>> 
>>>>> On Dec 21, 2012, at 8:24 PM, Edward Capriolo <edlinuxg...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> I have an old hadoop 0.20.2 cluster. Have not had any issues for a
> while.
>>>>>> (which is why I never bothered an upgrade)
>>>>>> 
>>>>>> Suddenly it OOMed last week. Now the OOMs happen periodically. We
> have a
>>>>>> fairly large NameNode heap Xmx 17GB. It is a fairly large FS about
>>>>>> 27,000,000 files.
>>>>>> 
>>>>>> So the strangest thing is that every 1 and 1/2 hour the NN memory
> usage
>>>>>> increases until the heap is full.
>>>>>> 
>>>>>> http://imagebin.org/240287
>>>>>> 
>>>>>> We tried failing over the NN to another machine. We change the Java
>>>>> version
>>>>>> from 1.6_23 -> 1.7.0.
>>>>>> 
>>>>>> I have set the NameNode logs to debug and ALL and I have done the same
>>>>> with
>>>>>> the data nodes.
>>>>>> Secondary NN is running and shipping edits and making new images.
>>>>>> 
>>>>>> I am thinking something has corrupted the NN MetaData and after enough
>>>>> time
>>>>>> it becomes a time bomb, but this is just a total shot in the dark.
> Does
>>>>>> anyone have any interesting trouble shooting ideas?
>> 

Reply via email to