This i have done done on

openjdk version "21.0.5" 2024-10-15 LTS

OpenJDK Runtime Environment (Red_Hat-21.0.5.0.11-1) (build 21.0.5+11-LTS)

OpenJDK 64-Bit Server VM (Red_Hat-21.0.5.0.11-1) (build 21.0.5+11-LTS,
mixed mode, sharing)


Aggread the JDK reports are fixed in Java 18 but seems like the issue is
not yet resolved the metric is reporting the wrong value.


Because I ran tests on a 4 CPU core machine where the GCP monitoring shows
average CPU usage around 25-30% [no restriction on cpuQuota] and the
metrics i pulled using the

http://search-solr:8983/solr/admin/metrics?prefix=os.systemCpuLoad&wt=json

reports the value close and solr rejects the requests with 429.

On Tue, Jul 29, 2025 at 7:13 PM Gus Heck <gus.h...@gmail.com> wrote:

> Looks like most of those JDK reports are fixed in Java 18. What version was
> the OP on?
>
> On Tue, Jul 29, 2025 at 7:49 AM Jason Gerlowski <gerlowsk...@gmail.com>
> wrote:
>
> > Hi Puneet,
> >
> > It certainly looks like there are a lot of bugs in load-average
> > reporting - I never realized it was so shaky in those containerized
> > environments!  Thanks for the thorough writeup.
> >
> > The question is what to do about it.  On the one hand "load average"
> > is only one of several circuit breakers that Solr offers, and it's
> > likely still providing value for folks who happen to run in
> > non-containerized environments.  Maybe the best thing to do is to
> > update our docs to highlight these limitations, and suggest folks
> > running in Kubernetes, etc. steer clear of the load-avg circuit
> > breaker?
> >
> > Would you be willing to file a JIRA ticket to summarize the problem
> > and propose how it might be addressed?
> >
> > > OperatingSystemMXBean.getSystemCpuLoad() consistently reports values
> > *close
> > > to 1.0 (100%)*.
> >
> > You may know this already, but to highlight it for others: a CPU Load
> > of 1.0 doesn't imply utilization of 100%.
> >
> > CPU Load, or load-average, is a measure of how many processes are
> > currently using or waiting for a CPU.  It's a distinct metric from CPU
> > utilization, which measures what percentage of time your CPUs are
> > utilized.
> >
> > So having a CPU of 1.0 and utilization of 20-30% isn't necessarily
> > wrong or contradictory.  It may be correct.  (I would say "Those
> > values are correct", if not for all of the issue-tracker links you
> > shared above, which make a compelling theoretical case.)
> >
> > Best,
> >
> > Jason
> >
> > On Mon, Jul 28, 2025 at 2:47 PM PUNEET SHARMA
> > <puneetsharmaps...@gmail.com> wrote:
> > >
> > > Hi Team,Currently Solr's CPU circuit breaker mechanism relies on CPU
> load
> > > metrics obtained from the Java OperatingSystemMXBean. However, in
> > > environments (notably when running in cloud platforms like Google Cloud
> > > Platform - GCP), this metric inaccurately reports CPU usage, causing
> the
> > > circuit breaker to trip unnecessarily. Here is the observed issue, root
> > > cause, supporting references, and a diagnostic utility used to
> > investigate
> > > the problem.Solr’s CPU circuit breaker is using
> > > com.sun.management.OperatingSystemMXBean.getSystemCpuLoad() to monitor
> > CPU
> > > usage. These metrics have been observed to return misleading values
> > >
> > >    -
> > >
> > >    GCP monitoring shows average Solr CPU usage around *25-30%*.
> > >    -
> > >
> > >    OperatingSystemMXBean.getSystemCpuLoad() consistently reports values
> > *close
> > >    to 1.0 (100%)*.
> > >    -
> > >
> > >    As a result, Solr’s CPU circuit breaker falsely assumes high load
> and
> > >    prematurely *trips*, potentially impacting service availability or
> > >    throttling requests unnecessarily.
> > >
> > > This discrepancy arises from a change in how CPU metrics are calculated
> > in
> > > the JDK.
> > > cgroup configs
> > >
> > > CPUUsageNSec=378033177304000
> > > CPUAccounting=yes
> > > CPUWeight=[not set]
> > > StartupCPUWeight=[not set]
> > > CPUShares=[not set]
> > > StartupCPUShares=[not set]
> > > CPUQuotaPerSecUSec=infinity
> > > CPUQuotaPeriodUSec=infinity
> > > LimitCPU=infinity
> > > LimitCPUSoft=infinity
> > > CPUSchedulingPolicy=0
> > > CPUSchedulingPriority=0
> > > CPUAffinityFromNUMA=no
> > > CPUSchedulingResetOnFork=no
> > > *Relevant JDK Bugs and Fixes**JDK-8248215*
> > >
> > >    -
> > >
> > >    *Title*: Improve OperatingSystemMXBean API to report CPU load
> > >    information for containers
> > >    -
> > >
> > >    *Link*: JDK-8248215 <https://bugs.openjdk.org/browse/JDK-8248215>
> > >    -
> > >
> > >    *Summary*: Introduced enhancements to better support reporting of
> CPU
> > >    metrics inside containerized environments.
> > >
> > > *JDK-8269851*
> > >
> > >    -
> > >
> > >    *Title*: OperatingSystemMXBean getSystemCpuLoad reports incorrect
> > value
> > >    inside a container
> > >    -
> > >
> > >    *Link*: JDK-8269851 <https://bugs.openjdk.org/browse/JDK-8269851>
> > >    -
> > >
> > >    *Commit*: Github PR
> > >    <
> >
> https://github.com/openjdk/jdk/commit/25f00d787cf56f6cdca6949115d04e7d8e675554#diff-2bc4c3408fc6fae6e133b8ffd644b933dcbe372cf249547d4c49ed94444c9735R45-R282
> > >
> > >    -
> > >
> > >    *Impact*: Introduced changes that affect the internal behavior of
> > >    getSystemCpuLoad() and getProcessCpuLoad(). Post this change, the
> > >    reported CPU usage may not correctly reflect real CPU usage inside
> > >    containers.
> > >
> > >
> > > To verify the discrepancy, added a class within Solr to print out
> > real-time
> > > CPU load metrics as seen by the JVM.*MonitorCpu.java*
> > >
> > > // To compile:
> > > // javac
> >
> /path/to/solr/core/src/java/org/apache/solr/util/circuitbreaker/MonitorCpu.java
> > > // To run:
> > > // java -cp /path/to/solr/core/src/java
> > > org.apache.solr.util.circuitbreaker.MonitorCpu
> > >
> > > package org.apache.solr.util.circuitbreaker;
> > >
> > > import com.sun.management.OperatingSystemMXBean;
> > > import java.lang.management.ManagementFactory;
> > >
> > > public class MonitorCpu {
> > >     public static void main(String[] args) {
> > >         OperatingSystemMXBean osBean =
> > >             (OperatingSystemMXBean)
> > > ManagementFactory.getOperatingSystemMXBean();
> > >
> > >         while (true) {
> > >             double cpuLoad = osBean.getSystemCpuLoad(); // or
> > > getProcessCpuLoad()
> > >             System.out.printf("Current CPU load: %.2f%n", cpuLoad);
> > >
> > >             try {
> > >                 Thread.sleep(1000); // Pause to reduce output rate
> > >             } catch (InterruptedException e) {
> > >                 Thread.currentThread().interrupt();
> > >             }
> > >         }
> > >     }
> > > }
> > > *Observations from Execution*
> > >
> > >
> > >    -
> > >
> > >    The printed cpuLoad value often fluctuates near *1.0*, despite
> actual
> > >    CPU load being far lower.
> > >    -
> > >
> > >    Confirms the mismatch between Java-reported CPU metrics and actual
> > usage
> > >    observed via system tools or GCP monitoring.
> > >
> > > *Implications for Solr*
> > >
> > >    -
> > >
> > >    Solr's CPU circuit breaker, relying on these metrics, is *misled
> into
> > >    believing the node is under high load*.
> > >    -
> > >
> > >    Can cause *premature degradation* or *request throttling*, even when
> > >    system resources are sufficient.
> > >    -
> > >
> > >    Especially critical in *containerized* or *cloud-native* deployments
> > >    (e.g., Kubernetes, GKE), where resource quotas and visibility differ
> > from
> > >    traditional environments.
> > >
> > >
> > >
> > > Is anyone facing this issue in solr cpu circuit breaker ?
> > >
> > > Should we change the metric used in solr circuit breakers ?
> > >
> > > Can we divide the current metric by available processors to get the
> > correct
> > > value (Runtime.getRuntime().availableProcessors()) ?
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > For additional commands, e-mail: dev-h...@solr.apache.org
> >
> >
>
> --
> http://www.needhamsoftware.com (work)
> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>

Reply via email to