[ 
https://issues.apache.org/jira/browse/CASSANDRA-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032767#comment-14032767
 ] 

Robert Stupp commented on CASSANDRA-7402:
-----------------------------------------

Real heap/memory profiling is not possible in Sun's VM at all. It is neither 
possible to figure out the exact heap size of an object nor is it possible to 
inquire how much heap was allocated for a thread. There are a lot of things 
that influence how, where and even if (TLAB, hotspot profiling) a particular 
object is allocated in a heap region. Google has some "native extension" to the 
VM for object heap profiling - OpenJDK group is thinking of object allocations 
on heap and "value objects" (prevent "new Long(x)" to be a heap object).

All you can do is to listen for GC-MBean events and inspect the results - but 
that does not tell you "who" allocated "what". But the values need to be 
carefully interpreted since objects move between generations (eden, survivor, 
old for example) or had a very short lifetime. Every value that a so called 
profiler emits needs to be interpreted really carefully because they are in 
fact just estimates calculated via instrumentation, which has its own really 
bad negative effect.

IMO all that can be done without too much overhead would be to simply "count 
rows * row size + request size + communication overhead" and multiply it with 
"some factor" (which has to be determined in some controlled "lab" environments 
- painful, boring, iterative, try'n'error work :) ).

> limit the on heap memory available to requests
> ----------------------------------------------
>
>                 Key: CASSANDRA-7402
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7402
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: T Jake Luciani
>             Fix For: 3.0
>
>
> When running a production cluster one common operational issue is quantifying 
> GC pauses caused by ongoing requests.
> Since different queries return varying amount of data you can easily get your 
> self into a situation where you Stop the world from a couple of bad actors in 
> the system.  Or more likely the aggregate garbage generated on a single node 
> across all in flight requests causes a GC.
> We should be able to set a limit on the max heap we can allocate to all 
> outstanding requests and track the garbage per requests to stop this from 
> happening.  It should increase a single nodes availability substantially.
> In the yaml this would be
> {code}
> total_request_memory_space_mb: 400
> {code}
> It would also be nice to have either a log of queries which generate the most 
> garbage so operators can track this.  Also a histogram.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to