Re: OOM Exception

Brian Burruss Wed, 16 Dec 2009 18:48:46 -0800

Glad to hear "that" bug is fixed ;)

Can the configuration params like memtable size be changed between server 
starts without clearing the data?



Jonathan Ellis <[email protected]> wrote:


You're OOMing after log replay finishes there.  So I can still
maintain that beta2 fixed the "replay uses more memory" bug :)

It looks like you're running out of memory when the other node
restarts, and it needs to read the hinted rows into memory to send
them over.

I suggest halving your MemtableSizeInMB, 1.5GB is pretty large.

On Wed, Dec 16, 2009 at 7:01 PM, Brian Burruss <[email protected]> wrote:
> attached ... the log starts when i restarted server.  notice that not too far 
> into it is when the other node went down because of OOM and i restarted it as 
> well.
>
> ________________________________________
> From: Jonathan Ellis [[email protected]]
> Sent: Wednesday, December 16, 2009 4:53 PM
> To: [email protected]
> Subject: Re: OOM Exception
>
> sorry, i meant the system.log the 2nd time (clear it out before
> replaying so it's not confused w/ other info, pls)
>
> On Wed, Dec 16, 2009 at 5:39 PM, Brian Burruss <[email protected]> wrote:
>> is this what you want?  they are big - i'd rather not spam everyone with 
>> them.  if you need them or the hprof files i can tar them and send them to 
>> you.
>>
>> thx!
>>
>>
>> [bburr...@gen-app02 cassandra]$ ls -l ~/cassandra/btoddb/commitlog/
>> total 597228
>> -rw-rw-r-- 1 bburruss bburruss 134219796 Dec 16 13:52 
>> CommitLog-1260995895123.log
>> -rw-rw-r-- 1 bburruss bburruss 134218547 Dec 16 13:52 
>> CommitLog-1260997811317.log
>> -rw-rw-r-- 1 bburruss bburruss 134218331 Dec 16 13:52 
>> CommitLog-1260998497744.log
>> -rw-rw-r-- 1 bburruss bburruss 134219677 Dec 16 13:53 
>> CommitLog-1261000330587.log
>> -rw-rw-r-- 1 bburruss bburruss  74055680 Dec 16 14:49 
>> CommitLog-1261000439079.log
>> [bburr...@gen-app02 cassandra]$
>>
>> ________________________________________
>> From: Jonathan Ellis [[email protected]]
>> Sent: Wednesday, December 16, 2009 3:29 PM
>> To: [email protected]
>> Subject: Re: OOM Exception
>>
>> How large are the log files being replayed?
>>
>> Can you attach the log from a replay attempt?
>>
>> On Wed, Dec 16, 2009 at 5:21 PM, Brian Burruss <[email protected]> wrote:
>>> sorry, thought i included everything ;)
>>>
>>> however, i am using beta2
>>>
>>> ________________________________________
>>> From: Jonathan Ellis [[email protected]]
>>> Sent: Wednesday, December 16, 2009 3:18 PM
>>> To: [email protected]
>>> Subject: Re: OOM Exception
>>>
>>> What version are you using?  0.5 beta2 fixes the
>>> using-more-memory-on-startup problem.
>>>
>>> On Wed, Dec 16, 2009 at 5:16 PM, Brian Burruss <[email protected]> wrote:
>>>> i'll put my question first:
>>>>
>>>> - how can i determine how much RAM is required by cassandra?  (for normal 
>>>> operation and restarting server)
>>>>
>>>> *** i've attached my storage-conf.xml
>>>>
>>>> i've gotten several more OOM exceptions since i mentioned it a week or so 
>>>> ago.  i started from a fresh database a couple days ago and have been 
>>>> adding 2k blocks of data keyed off a random integer at the rate of about 
>>>> 400/sec.  i have a 2 node cluster, RF=2, Consistency for read/write is 
>>>> ONE.  there are ~70,420,082 2k blocks of data in the database.
>>>>
>>>> i used the default memory setup of Xmx1G when i started a couple days ago. 
>>>>  as the database grew to ~180G (reported by unix du command) both servers 
>>>> OOM'ed at about the same time, within 10 minutes of each other.  well 
>>>> needless to say, my cluster is dead.  so i upped the memory to 3G and the 
>>>> servers tried to come back up, but one died again with OOM.
>>>>
>>>> Before cleaning the disk and starting over a couple days ago, i played the 
>>>> game of "jack up the RAM", but eventually i didn't want to up it anymore 
>>>> when i got to 5G.  the parameter, SSTable.INDEX_INTERVAL, was discussed a 
>>>> few days ago that would change the number of "keys" cached in memory, so i 
>>>> could modify that at the cost of read performance, but doing the math, 3G 
>>>> should be plenty of room.
>>>>
>>>> it seems like startup requires more RAM than just normal running.
>>>>
>>>> so this of course concerns me.
>>>>
>>>> i have the hprof files from when the server initially crashed and when it 
>>>> crashed trying to restart if anyone wants them
>>>>
>>>
>>
>

Re: OOM Exception

Reply via email to