[ 
https://issues.apache.org/jira/browse/CASSANDRA-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-4265:
--------------------------------------

    Fix Version/s:     (was: 1.2.1)
    
> Limit total open connections (HSHA server)
> ------------------------------------------
>
>                 Key: CASSANDRA-4265
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4265
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API
>    Affects Versions: 0.8.3
>         Environment: Ubuntu 10.04, 64bit, Oracle 1.6.0_32 and OpenJDK 1.6.0_20
> Connecting with Hector 1.1-0
>            Reporter: James Kearney
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>              Labels: thrift
>         Attachments: 0001-Limit-HSHA-open-connections.patch
>
>
> When using the rpc_server_type: hsha there seems to be no limit on the number 
> of open connections that cassandra accepts / on the total memory consumed by 
> them. This can lead to OOM errors since the HSHA server assigns a FrameBuffer 
> per connection which is only cleaned up when the connection is closed.
> Setup:
> I wrote a simple test App using Hector which iterated through my rows 
> retrieving data. If my Hector connection pool size was set high (in this case 
> 100) then after a while cassandra would crash with OOM. The program was 
> sequential so only 1 connection was actually in use at any one time but from 
> what I can tell (and from MAT analysis) all the open connections were 
> consuming memory as well (the FrameBuffers).
> At the moment all the solutions to this OOM problem seem to rest with the 
> client. The memory consumed (on a node) is equal to [open connections]*[max 
> request/response size] (it is max because of 
> https://issues.apache.org/jira/browse/THRIFT-1457).
> This means the client needs to know how much memory each node has spare for 
> it to use up with its connection pool. If you have a distributed set of 
> clients then they would have to co-ordinate on how many open connections they 
> have per node.
> I was just testing on a dev machine with small heap sizes (512mb-2GB) but the 
> memory consumed as stated is based off connections and buffer size so this 
> problem would scale for larger heap sizes.
> Solutions:
> The simplest would be a limit on the number of connections the HSHA server 
> accepts. I only started looking into cassandra a few days ago but I tried a 
> very simple connection limit mechanism (I will attach the patch) which seemed 
> to work. I'm sure it can be done much cleaner than my version.
> This means the client or clients only have to have a hard limit on their max 
> request size (lets say 2mb). Then for each node you know that allowing 100 
> connections to this node would potentially use up 200mb of memory. You can 
> then tune this number per node. (This isn't perfect since clients can't 
> always tell exactly how big a serialised response will be so you can go above 
> the 200mb)
> A more complex solution might be able to remove the burden from the client 
> completely. Thrift doesn't have streaming support but I assume when cassandra 
> reads data from disk / memtables streaming can be done at that point. If this 
> is the case then you can monitor how much memory client connections are 
> consuming in total. If a request comes in and buffering the request / 
> buffering the response would push cassandra over the limit then you can send 
> an error back instead of servicing the request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to