markrmiller commented on code in PR #4514:
URL: https://github.com/apache/solr/pull/4514#discussion_r3470079920


##########
solr/core/src/java/org/apache/solr/util/circuitbreaker/MemoryCircuitBreaker.java:
##########
@@ -17,75 +17,145 @@
 
 package org.apache.solr.util.circuitbreaker;
 
-import java.io.IOException;
 import java.lang.invoke.MethodHandles;
 import java.lang.management.ManagementFactory;
 import java.lang.management.MemoryMXBean;
+import java.lang.management.MemoryPoolMXBean;
+import java.lang.management.MemoryType;
+import java.lang.management.MemoryUsage;
+import java.util.ArrayList;
+import java.util.List;
 import java.util.Locale;
-import org.apache.solr.util.RefCounted;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.regex.Pattern;
+import org.apache.solr.common.util.EnvUtils;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 /**
- * Tracks the current JVM heap usage and triggers if a moving heap usage 
average over 30 seconds
- * exceeds the defined percentage of the maximum heap size allocated to the 
JVM. Once the average
- * memory usage goes below the threshold, it will start allowing queries again.
+ * Trips when post-collection live data in the JVM heap exceeds a configured 
percentage of the
+ * maximum heap size.
  *
- * <p>The memory threshold is defined as a percentage of the maximum memory 
allocated -- see
- * memThreshold in <code>solrconfig.xml</code>.
+ * <p>The signal is read from {@link MemoryPoolMXBean#getCollectionUsage()} on 
the old/tenured heap
+ * pool, which reports memory usage immediately after the most recent 
collection that affected that
+ * pool. This is the only memory reading that distinguishes "live data" from 
"garbage waiting to be
+ * collected."
+ *
+ * <p>Earlier versions of this breaker sampled {@link 
MemoryMXBean#getHeapMemoryUsage()} on a
+ * 30-second moving average, which produced a high signal during normal 
operation: with a
+ * generational collector, {@code used} climbs toward {@code max} between 
collections — that's the
+ * steady-state shape, not a problem. The new signal updates only when an 
old-gen GC runs, which is
+ * the only point at which "how full is the heap really?" has a defined answer.
+ *
+ * <p>Pool selection by collector:
+ *
+ * <ul>
+ *   <li><b>G1 / Parallel / Serial / generational ZGC:</b> uses the pool whose 
name matches the

Review Comment:
   Yes, this is the fix, not a side change. I will say that plainly in the JIRA 
and the class javadoc.
   
   MemoryCircuitBreaker was agnostic to the pool and GC algorithms, but that 
was because it could do the wrong thing at times. The old breaker read 
`MemoryMXBean.getHeapMemoryUsage().getUsed()` That is whole-heap occupancy, 
including garbage that has not yet been collected. On a generational collector, 
that number climbs toward `max` between collections during normal operation. 
That is the steady-state shape of a healthy JVM, not heap pressure to trigger a 
circuit breaker. Using 99% of the heap is meaningless - that can be 1 second 
before a harmless GC drops it to 10%. The breaker tripped on pre-GC peaks that 
the next collection was about to reclaim.
   
   `getCollectionUsage()` on the old-gen pool reports occupancy right after the 
last collection - live data, not garbage, not live data + garbage. That is the 
only reading that answers "Is the heap actually full?"
   
   `getCollectionUsage()` is per-pool, so you have to pick a pool, and the one 
you want is the old/tenured generation where long-lived data is, and each 
collector names that differently:
   
   - G1 → `G1 Old Gen`
   - Parallel → `PS Old Gen`
   - Serial → `Tenured Gen`
   - generational ZGC → `ZGC Old Generation`
   
   Those match the `\b(Old|Tenured)\b` name pattern, so the circuit breaker 
reads getCollectionUsage() on that pool. Non-generational ZGC and Shenandoah 
are different, and some yet to be introduced collectors may be too - they have 
one combined heap pool and no old-gen name to match. When nothing matches the 
pattern, the fallback sums `getCollectionUsage()` across every `HEAP`-typed 
pool instead. So the name match plus fallback is there because the pool layout 
changes depending on the collector.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to