[
https://issues.apache.org/jira/browse/CASSANDRA-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358866#comment-14358866
]
Ariel Weisberg commented on CASSANDRA-8670:
-------------------------------------------
I am getting to this now. Should be fixed in 3.0. Once I have it fixed for 3.0
we can decide about back porting to 2.1.
> Large columns + NIO memory pooling causes excessive direct memory usage
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-8670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8670
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
> Fix For: 3.0
>
>
> If you provide a large byte array to NIO and ask it to populate the byte
> array from a socket it will allocate a thread local byte buffer that is the
> size of the requested read no matter how large it is. Old IO wraps new IO for
> sockets (but not files) so old IO is effected as well.
> Even If you are using Buffered{Input | Output}Stream you can end up passing a
> large byte array to NIO. The byte array read method will pass the array to
> NIO directly if it is larger then the internal buffer.
> Passing large cells between nodes as part of intra-cluster messaging can
> cause the NIO pooled buffers to quickly reach a high watermark and stay
> there. This ends up costing 2x the largest cell size because there is a
> buffer for input and output since they are different threads. This is further
> multiplied by the number of nodes in the cluster - 1 since each has a
> dedicated thread pair with separate thread locals.
> Anecdotally it appears that the cost is doubled beyond that although it isn't
> clear why. Possibly the control connections or possibly there is some way in
> which multiple
> Need a workload in CI that tests the advertised limits of cells on a cluster.
> It would be reasonable to ratchet down the max direct memory for the test to
> trigger failures if a memory pooling issue is introduced. I don't think we
> need to test concurrently pulling in a lot of them, but it should at least
> work serially.
> The obvious fix to address this issue would be to read in smaller chunks when
> dealing with large values. I think small should still be relatively large (4
> megabytes) so that code that is reading from a disk can amortize the cost of
> a seek. It can be hard to tell what the underlying thing being read from is
> going to be in some of the contexts where we might choose to implement
> switching to reading chunks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)