[
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307856#comment-17307856
]
Cassandra Targett commented on SOLR-15056:
------------------------------------------
The documentation being part of the patch isn't a problem for me at all; as a
project I feel we should prefer doc updates to be made with the code changes in
the same patch or pull request. I can't always look at doc changes before they
are committed, but whether I do it pre- or post-commit it's always helpful to
have all the changes related to an issue together.
These changes look fine to me _as long as they reflect the code changes_. The
grammar changes are improvements IMO, but I can only assume that they
accurately reflect the code changes.
Atri, from your perspective, I think you'd want to verify the factual aspects
of doc changes. For example, if the docs now say that the circuit breaker
tracks the {{OperatingSystemMXBean.getSystemCpuLoad()}}, you'd want to check
that is true based on the content of the patch. There's no way I have time to
track down every doc change into the code, I expect the code reviewers to have
done that (and IMO that's another reason why the doc changes being part of the
patch/PR is helpful).
I have other minor nits about the doc changes - mainly that I try to put values
for parameters in monospace instead of quotes as they are here, but I wouldn't
let that hold up committing this if or when it happens to be ready.
> CPU circuit breaker needs to use CPU utilization, not Unix load average
> -----------------------------------------------------------------------
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
> Issue Type: Bug
> Components: metrics
> Affects Versions: 8.7
> Reporter: Walter Underwood
> Assignee: Atri Sharma
> Priority: Major
> Labels: Metrics
> Attachments:
> 0001-SOLR-15056-Circuit-Breakers-use-CPU-utilization-inst.patch,
> 0002-SOLR-15056-clean-up-linkage-to-SolrCore-add-back-loa.patch,
> SOLR-15056.patch
>
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of
> the count of processes waiting to run. It is effectively unbounded. I've seen
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle
> >during the recent period of time observed, while a value of 1.0 means that
> >all CPUs were actively running 100% of the time during the recent period
> >being observed. All values betweens 0.0 and 1.0 are possible depending of
> >the activities going on in the system. If the system recent cpu usage is not
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the
> memory and CPU circuit breakers.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)