[
https://issues.apache.org/jira/browse/CASSANDRA-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277953#comment-14277953
]
Benedict commented on CASSANDRA-8518:
-------------------------------------
This is one of the two methods I proposed, and I'm comfortable aiming for the
global threshold. Per-request thresholds are also a possibility, and seem
reasonable also. Whether or not we _throttle_ or simply discard some in-flight
queries on exceeding our limit is another matter though. I would prefer to go
the route of discarding some random in-flight queries, as this brings the
system back to full health immediately, instead of letting it crawl along until
the blockage clears.
> Cassandra Query Request Size Estimator
> --------------------------------------
>
> Key: CASSANDRA-8518
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8518
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Cheng Ren
>
> We have been suffering from cassandra node crash due to out of memory for a
> long time. The heap dump from the recent crash shows there are 22 native
> transport request threads each of which consumes 3.3% of heap size, taking
> more than 70% in total.
> Heap dump:
> !https://dl-web.dropbox.com/get/attach1.png?_subject_uid=303980955&w=AAAVOoncBoZ5aOPbDg2TpRkUss7B-2wlrnhUAv19b27OUA|height=400,width=600!
> Expanded view of one thread:
> !https://dl-web.dropbox.com/get/Screen%20Shot%202014-12-18%20at%204.06.29%20PM.png?_subject_uid=303980955&w=AACUO4wrbxheRUxv8fwQ9P52T6gBOm5_g9zeIe8odu3V3w|height=400,width=600!
> The cassandra we are using now (2.0.4) utilized MemoryAwareThreadPoolExecutor
> as the request executor and provided a default request size estimator which
> constantly returns 1, meaning it limits only the number of requests being
> pushed to the pool. To have more fine-grained control on handling requests
> and better protect our node from OOM issue, we propose implementing a more
> precise estimator.
> Here is our two cents:
> For update/delete/insert request: Size could be estimated by adding size of
> all class members together.
> For scan query, the major part of the request is response, which can be
> estimated from the history data. For example if we receive a scan query on a
> column family for a certain token range, we keep track of its response size
> used as the estimated response size for later scan query on the same cf.
> For future requests on the same cf, response size could be calculated by
> token range*recorded size/ recorded token range. The request size should be
> estimated as (query size + estimated response size).
> We believe what we're proposing here can be useful for other people in the
> Cassandra community as well. Would you mind providing us feedbacks? Please
> let us know if you have any concerns or suggestions regarding this proposal.
> Thanks,
> Cheng
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)