[jira] [Commented] (BROOKLYN-375) Brooklyn intermittently uses high CPU levels and becomes unresponsive

ASF GitHub Bot (JIRA) Mon, 14 Nov 2016 01:51:08 -0800

    [ 
https://issues.apache.org/jira/browse/BROOKLYN-375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663322#comment-15663322
 ]


ASF GitHub Bot commented on BROOKLYN-375:
-----------------------------------------

Github user aledsage commented on a diff in the pull request:

    https://github.com/apache/brooklyn-docs/pull/122#discussion_r87766901
  
    --- Diff: guide/ops/troubleshooting/memory-usage.md ---
    @@ -0,0 +1,138 @@
    +---
    +layout: website-normal
    +title: "Troubleshooting: Monitoring Memory Usage"
    +toc: /guide/toc.json
    +---
    +
    +## Memory Usage
    +
    +Brooklyn tries to keep in memory as much history of its activity as 
possible,
    +for displaying through the UI, so it is normal for it to consume as much 
memory
    +as it can.  It uses "soft references" so these objects will be cleared if 
needed,
    +but **it is not a sign of anything unusual if Brooklyn is using all its 
available memory**.
    +
    +The number of active tasks, CPU usage, thread counts, and 
    +retention of soft reference objects are a much better indication of load.
    +This information can be found by looking in the log for lines containing
    +`brooklyn gc`, such as:
    +
    +    2016-09-16 16:19:43,337 DEBUG o.a.b.c.m.i.BrooklynGarbageCollector 
[brooklyn-gc]: brooklyn gc (before) - using 910 MB / 3.76 GB memory; 98% 
soft-reference maybe retention (of 362); 35 threads; tasks: 0 active, 2 
unfinished; 31 remembered, 1013 total submitted) 
    +
    +The soft-reference figure is indicative, but the lower this is, the more
    +the JVM has decided to get rid of items that were desired to be kept but 
optional.
    +It only tracks some soft-references (those wrapped in `Maybe`),
    +and of course if there are many many such items the JVM will have to get 
rid
    +of some, so a lower figure does not necessarily mean a problem.
    +Typically however if there's no `OutOfMemoryError` (OOME) reported,
    +there's no problem.
    +
    +
    +## Problem Indicators and Resolutions
    +
    +Two things that *do* normally indicate a problem with memory are:
    +
    +* `OutOfMemoryError` exceptions being thrown
    +* Memory usage high *and* CPU high, where the CPU is spent doing full 
garbage collection
    +
    +One possible cause is the JVM doing a poorly-selected GC strategy,
    +as described in [Oracle Java bug 
6912889](http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6912889).
    +This can be confirmed by running the "analyzing soft reference usage" 
technique below;
    +memory should shrink dramatically then increase until the problem recurs.
    +This can be fixed by passing `-XX:SoftRefLRUPolicyMSPerMB=1` to the JVM,
    +as described in [Brooklyn issue 
375](https://issues.apache.org/jira/browse/BROOKLYN-375).
    +
    +Other common JVM options include `-Xms256m -Xmx1g -XX:MaxPermSize=256m`
    +(depending on JVM provider and version) to set the right balance of memory 
allocation.
    +In some cases a larger `-Xmx` value may simply be the fix
    +(but this should not be the case unless many or large blueprints are being 
used).
    +
    +If the problem is not with soft references but with real memory usage,
    +the culprit is likely a memory leak, typically in blueprint design.
    +An early warning of this situation is the "soft-reference maybe retention" 
level decreasing.
    +In these situations, follow the steps as described below for 
"Investigating Leaks".
    +
    +
    +## Analyzing Soft Reference Usage
    +
    +If you are concerned about memory usage, or doing evaluation on test 
environments, 
    +the following method (in the Groovy console) can be invoked to force the 
system to
    +reclaim as much memory as possible, including *all* soft references:
    +
    +    
org.apache.brooklyn.util.javalang.MemoryUsageTracker.forceClearSoftReferences()
    +
    +In good situations, memory usage should return to a small level.  
    +This call can be disruptive to the system however so use with care.
    +
    +The above method can also be configured to run automatically when memory 
usage 
    +is detected to hit a certain level.  That can be useful if external 
policies are
    +being used to warn on high memory usage, and you want to keep some 
headroom.
    +Many JVM authorities discourage interfering with its garbage collector, 
however,
    +so use with care and study the particular JVM you are using.
    +See the class `BrooklynGarbageCollector` for more information.
    +
    +
    +## Investigating Leaks
    +
    +If a memory leak is found, the first place to look should be the 
WARN/ERROR logs.
    +Many common causes of leaks, including as runaway tasks and cyclic 
dependent configuration,
    +will show their own log errors prior to the memory error.
    +
    +You should also note the task counts in the `brooklyn gc` messages 
described above,
    +and if there are an exceptional number of tasks or tasks are not clearing,
    +other log messages will describe what is happening, and the in-product task
    +view can indicate issues. 
    +
    +Sometimes slow leaks can occur if blueprints do not clean up entities or 
locations.
    +These can be diagnosed by noting the number of files written to the 
persistence location,
    +if persistence is being used.  Deploying then destroying a blueprint 
should not leave
    +anything behind in the persistence directory.
    +
    +Where problems have been encountered in the past, we have resolved them 
and/or
    +worked to improve logging and early identification.
    +Please report any issues so that we can improve this further.
    +In many cases we can also give advice on what other log `grep` patterns 
can be useful.
    +
    +
    +### Standard Java Techniques
    +
    +Useful standard Java techniques for tracking memory leaks include:
    +
    +* `jstack <pid>` to see what tasks are running
    +* `jmap -histo:live <pid>` to see what objects are using memory (see below)
    +* Memory profilers such as VisualVM or Eclipse MAT, either connected to a 
running system or
    +  against a heap dump generated on an OOME
    +
    +More information is available on [the Oracle Java web 
site](https://docs.oracle.com/javase/7/docs/webnotes/tsg/TSG-VM/html/memleaks.html).
    +
    +Note that some of the above techniques will often include soft and weak 
references that are irrelevant
    +to the problem (and will be cleared on an OOME). Objects that may be 
cached in that way include:
    +
    +* `BasicConfigKey` (used for the web server and many blueprints)
    +* `DslComponent` and `*Task` (used for Brooklyn activities and dependent 
configuration)
    +* `jclouds` items including `ImageImpl` (to cache data on cloud service 
providers)
    +
    +On the other hand any of the above may also indicate a leak.
    +Taking snapshots after a `forceClearSoftReferences()` (above) invocation 
and comparing those
    +is one technique to filter out noise.  Another is to wait until there is 
an OOME
    +and look just after, because that will clear all non-essential data from 
memory.
    +(The `forceClearSoftReferences()` actually works by triggering an OOME, in 
as safe 
    +a way as possible.)
    +
    +If leaked items are found, a profiler will normally let you see their 
content
    +and walk backwards along their references to find out why they are being 
retained.
    +
    +
    +### Summary of Techniques
    +
    +The following sequence of techniques is a common approach to investigating 
and fixing memory issues:
    +
    +* Note the log lines about `brooklyn gc`, including memory and tasks
    +* Do not assume high memory usage alone is an error, as soft reference 
caches are deliberate; 
    +  use `forceClearSoftReferences()` to clear these
    --- End diff --
    
    @ahgittin (cc @neykov) I thought we were not going to recommend using 
`forceClearSoftReferences()` this in any kind of production environment. Can we 
put in a caveat here about not using it in production. I'd be extremely caution 
about encouraging real users to call this until devs have been using it in 
anger themselves a lot.
    
    With the use of the (much safer) `-XX:SoftRefLRUPolicyMSPerMB=1`, I'd 
expect the need for calling this would be greatly reduced.


> Brooklyn intermittently uses high CPU levels and becomes unresponsive
> ---------------------------------------------------------------------
>
>                 Key: BROOKLYN-375
>                 URL: https://issues.apache.org/jira/browse/BROOKLYN-375
>             Project: Brooklyn
>          Issue Type: Bug
>         Environment: OSX Sierra, Java 1.7
>            Reporter: Duncan Godwin
>
> Intermittently whilst launching a clocker swarm within brooklyn, it uses high 
> CPU levels and becomes unresponsive. This was noted when testing failover by 
> manally stopping some nodes with `shutdown -h`.
> [jstack 1|https://gist.github.com/drigodwin/c5946d23ed11350f393d9ba9b80a2a2d]
> [jstack 2|https://gist.github.com/drigodwin/5619b02c0c1d53ceb0c99234d8f0dd96]
> [jclouds.debug.log|https://gist.github.com/drigodwin/365d39d216e6a56c634a5020496ef8f1]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (BROOKLYN-375) Brooklyn intermittently uses high CPU levels and becomes unresponsive

Reply via email to