[ 
https://issues.apache.org/jira/browse/CASSANDRA-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793735#comment-13793735
 ] 

Tomas Salfischberger edited comment on CASSANDRA-6191 at 10/13/13 6:20 PM:
---------------------------------------------------------------------------

Ok, I've indeed done that based on the recommended size in CASSANDRA-5727 (and 
at that point I ran into CASSANDRA-6191 :-)). On IRC rcoli reported he will 
write a blog post documenting the LCS with 5mb -> STS -> CSV with 256mb route.

However, I think we might still want to create something that un-references the 
buffer when the reader gets added to the pool? Or a WARN message in the logs 
when we're opening tens of thousands of readers? Because this is very hard to 
find out without looking at the code which we can't expect normal users to do?

Edit: Ah, CASSANDRA-5661 has a method to close them. That's of course the best 
fix. Any changes of back-porting to 1.2?


was (Author: t0mas):
Ok, I've indeed done that based on the recommended size in CASSANDRA-5727 (and 
at that point I ran into CASSANDRA-6191 :-)). On IRC rcoli reported he will 
write a blog post documenting the LCS with 5mb -> STS -> CSV with 256mb route.

However, I think we might still want to create something that un-references the 
buffer when the reader gets added to the pool? Or a WARN message in the logs 
when we're opening tens of thousands of readers? Because this is very hard to 
find out without looking at the code which we can't expect normal users to do?

> Memory exhaustion with large number of compressed SSTables
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-6191
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6191
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: OS: Debian 7.1
> Java: Oracle 1.7.0_25
> Cassandra: 1.2.10
> Memory: 24GB
> Heap: 8GB
>            Reporter: Tomas Salfischberger
>
> Not sure "bug" is the right description, because I can't say for sure that 
> the large number of SSTables is the cause of the memory issues. I'll share my 
> research so far:
> Under high read-load with a very large number of compressed SSTables (caused 
> by the initial default 5mb sstable_size in LCS) it seems memory is exhausted, 
> without any room for GC to fix this. It tries to GC but doesn't reclaim much.
> The node first hits the "emergency valves" flushing all memtables, then 
> reducing caches. And finally logs 0.99+ heap usages and hangs with GC failure 
> or crashes with OutOfMemoryError.
> I've taken a heapdump and started analysis to find out what's wrong. The 
> memory seems to be used by the byte[] backing the HeapByteBuffer in the 
> "compressed" field of 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader. The byte[] are 
> generally 65536 byes in size, matching the block-size of the compression.
> Looking further in the heap-dump I can see that these readers are part of the 
> pool in org.apache.cassandra.io.util.CompressedPoolingSegmentedFile. Which is 
> linked to the "dfile" field of org.apache.cassandra.io.sstable.SSTableReader. 
> The dump-file lists 45248 instances of CompressedRandomAccessReader.
> Is this intended to go this way? Is there a leak somewhere? Or should there 
> be an alternative strategy and/or warning for cases where a node is trying to 
> read far too many SSTables?
> EDIT:
> Searching through the code I found that PoolingSegmentedFile keeps a pool of 
> RandomAccessReader for re-use. While the CompressedRandomAccessReader 
> allocates a ByteBuffer in it's constructor and (to make things worse) 
> enlarges it if it's reasing a large chunk. This (sometimes enlarged) 
> ByteBuffer is then kept alive because it becomes part of the 
> CompressedRandomAccessReader which is in turn kept alive as part of the pool 
> in the PoolingSegmentedFile.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to