[jira] [Updated] (CASSANDRA-4265) Limit total open connections (HSHA server)

James Kearney (JIRA) Mon, 21 May 2012 09:14:43 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


James Kearney updated CASSANDRA-4265:
-------------------------------------

    Description: 
When using the rpc_server_type: hsha there seems to be no limit on the number 
of open connections that cassandra accepts / on the total memory consumed by 
them. This can lead to OOM errors since the HSHA server assigns a FrameBuffer 
per connection which is only cleaned up when the connection is closed.

Setup:
I wrote a simple test App using Hector which iterated through my rows 
retrieving data. If my Hector connection pool size was set high (in this case 
100) then after a while cassandra would crash with OOM. The program was 
sequential so only 1 connection was actually in use at any one time but from 
what I can tell (and from MAT analysis) all the open connections were consuming 
memory as well (the FrameBuffers).

At the moment all the solutions to this OOM problem seem to rest with the 
client. The memory consumed (on a node) is equal to [open connections]*[max 
request/response size] (it is max because of 
https://issues.apache.org/jira/browse/THRIFT-1457).

This means the client needs to know how much memory each node has spare for it 
to use up with its connection pool. If you have a distributed set of clients 
then they would have to co-ordinate on how many open connections they have per 
node.

I was just testing on a dev machine with small heap sizes (512mb-2GB) but the 
memory consumed as stated is based off connections and buffer size so this 
problem would scale for larger heap sizes.

Solutions:

The simplest would be a limit on the number of connections the HSHA server 
accepts. I only started looking into cassandra a few days ago but I tried a 
very simple connection limit mechanism (I will attach the patch) which seemed 
to work. I'm sure it can be done much cleaner than my version.

This means the client or clients only have to have a hard limit on their max 
request size (lets say 2mb). Then for each node you know that allowing 100 
connections to this node would potentially use up 200mb of memory. You can then 
tune this number per node. (This isn't perfect since clients can't always tell 
exactly how big a serialised response will be so you can go above the 200mb)

A more complex solution might be able to remove the burden from the client 
completely. Thrift doesn't have streaming support but I assume when cassandra 
reads data from disk / memtables streaming can be done at that point. If this 
is the case then you can monitor how much memory client connections are 
consuming in total. If a request comes in and buffering the request / buffering 
the response would push cassandra over the limit then you can send an error 
back instead of servicing the request.

  was:
When using the rpc_server_type: hsha there seems to be no limit on the number 
of open connections that cassandra accepts / on the total memory consumed by 
them. This leads to OOM since the HSHA server assigns a FrameBuffer per 
connection.

Setup:
I wrote a simple test App using Hector which iterated through my rows 
retrieving data. If my Hector connection pool size was set high (in this case 
100) then after a while cassandra would crash with OOM. The program was 
sequential so only 1 connection was actually in use at any one time but from 
what I can tell (and from MAT analysis) all the open connections were consuming 
memory as well (the FrameBuffers).

At the moment all the solutions to this OOM problem seem to rest with the 
client. The memory consumed (on a node) is equal to [open connections]*[max 
request/response size] (it is max because of 
https://issues.apache.org/jira/browse/THRIFT-1457).

This means the client needs to know how much memory each node has spare for it 
to use up with its connection pool. If you have a distributed set of clients 
then they would have to co-ordinate on how many open connections they have per 
node.

I was just testing on a dev machine with small heap sizes (512mb-2GB) but the 
memory consumed as stated is based off connections and buffer size so this 
problem would scale for larger heap sizes.

Solutions:

The simplest would be a limit on the number of connections the HSHA server 
accepts. I only started looking into cassandra a few days ago but I tried a 
very simple connection limit mechanism (I will attach the patch) which seemed 
to work. I'm sure it can be done much cleaner than my version.

This means the client or clients only have to have a hard limit on their max 
request size (lets say 2mb). Then for each node you know that allowing 100 
connections to this node would potentially use up 200mb of memory. You can then 
tune this number per node. (This isn't perfect since clients can't always tell 
exactly how big a serialised response will be so you can go above the 200mb)

A more complex solution might be able to remove the burden from the client 
completely. Thrift doesn't have streaming support but I assume when cassandra 
reads data from disk / memtables streaming can be done at that point. If this 
is the case then you can monitor how much memory client connections are 
consuming in total. If a request comes in and buffering the request / buffering 
the response would push cassandra over the limit then you can send an error 
back instead of servicing the request.

    
> Limit total open connections (HSHA server)
> ------------------------------------------
>
>                 Key: CASSANDRA-4265
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4265
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API
>    Affects Versions: 1.0.10, 1.1.0
>         Environment: Ubuntu 10.04, 64bit, Oracle 1.6.0_32 and OpenJDK 1.6.0_20
> Connecting with Hector 1.1-0
>            Reporter: James Kearney
>            Priority: Minor
>              Labels: thrift
>         Attachments: 0001-Limit-HSHA-open-connections.patch
>
>
> When using the rpc_server_type: hsha there seems to be no limit on the number 
> of open connections that cassandra accepts / on the total memory consumed by 
> them. This can lead to OOM errors since the HSHA server assigns a FrameBuffer 
> per connection which is only cleaned up when the connection is closed.
> Setup:
> I wrote a simple test App using Hector which iterated through my rows 
> retrieving data. If my Hector connection pool size was set high (in this case 
> 100) then after a while cassandra would crash with OOM. The program was 
> sequential so only 1 connection was actually in use at any one time but from 
> what I can tell (and from MAT analysis) all the open connections were 
> consuming memory as well (the FrameBuffers).
> At the moment all the solutions to this OOM problem seem to rest with the 
> client. The memory consumed (on a node) is equal to [open connections]*[max 
> request/response size] (it is max because of 
> https://issues.apache.org/jira/browse/THRIFT-1457).
> This means the client needs to know how much memory each node has spare for 
> it to use up with its connection pool. If you have a distributed set of 
> clients then they would have to co-ordinate on how many open connections they 
> have per node.
> I was just testing on a dev machine with small heap sizes (512mb-2GB) but the 
> memory consumed as stated is based off connections and buffer size so this 
> problem would scale for larger heap sizes.
> Solutions:
> The simplest would be a limit on the number of connections the HSHA server 
> accepts. I only started looking into cassandra a few days ago but I tried a 
> very simple connection limit mechanism (I will attach the patch) which seemed 
> to work. I'm sure it can be done much cleaner than my version.
> This means the client or clients only have to have a hard limit on their max 
> request size (lets say 2mb). Then for each node you know that allowing 100 
> connections to this node would potentially use up 200mb of memory. You can 
> then tune this number per node. (This isn't perfect since clients can't 
> always tell exactly how big a serialised response will be so you can go above 
> the 200mb)
> A more complex solution might be able to remove the burden from the client 
> completely. Thrift doesn't have streaming support but I assume when cassandra 
> reads data from disk / memtables streaming can be done at that point. If this 
> is the case then you can monitor how much memory client connections are 
> consuming in total. If a request comes in and buffering the request / 
> buffering the response would push cassandra over the limit then you can send 
> an error back instead of servicing the request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4265) Limit total open connections (HSHA server)

Reply via email to