This is an automated email from the ASF dual-hosted git repository.
jonwei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new b6b42d3 Minor processor quota computation fix + docs (#11783)
b6b42d3 is described below
commit b6b42d39367f1ff1d7a1aa7e0064ca0ed9c2e92f
Author: Arun Ramani <[email protected]>
AuthorDate: Fri Oct 8 20:52:03 2021 -0700
Minor processor quota computation fix + docs (#11783)
* cpu/cpuset cgroup and procfs data gathering
* Renames and default values
* Formatting
* Trigger Build
* Add cgroup monitors
* Return 0 if no period
* Update
* Minor processor quota computation fix + docs
* Address comments
* Address comments
* Fix spellcheck
Co-authored-by: arunramani-imply
<[email protected]>
---
.../druid/java/util/metrics/CgroupCpuMonitor.java | 20 ++++++++++++++++----
.../java/util/metrics/CgroupCpuMonitorTest.java | 10 ++++++++++
docs/configuration/index.md | 5 ++++-
docs/operations/metrics.md | 18 ++++++++++++++++--
website/.spelling | 1 +
5 files changed, 47 insertions(+), 7 deletions(-)
diff --git
a/core/src/main/java/org/apache/druid/java/util/metrics/CgroupCpuMonitor.java
b/core/src/main/java/org/apache/druid/java/util/metrics/CgroupCpuMonitor.java
index 826465b..ac4d545 100644
---
a/core/src/main/java/org/apache/druid/java/util/metrics/CgroupCpuMonitor.java
+++
b/core/src/main/java/org/apache/druid/java/util/metrics/CgroupCpuMonitor.java
@@ -65,12 +65,24 @@ public class CgroupCpuMonitor extends FeedDefiningMonitor
emitter.emit(builder.build("cgroup/cpu/shares", cpuSnapshot.getShares()));
emitter.emit(builder.build(
"cgroup/cpu/cores_quota",
- cpuSnapshot.getPeriodUs() == 0
- ? 0
- : ((double) cpuSnapshot.getQuotaUs()
- ) / cpuSnapshot.getPeriodUs()
+ computeProcessorQuota(cpuSnapshot.getQuotaUs(),
cpuSnapshot.getPeriodUs())
));
return true;
}
+
+ /**
+ * Calculates the total cores allocated through quotas. A negative value
indicates that no quota has been specified.
+ * We use -1 because that's the default value used in the cgroup.
+ *
+ * @param quotaUs the cgroup quota value.
+ * @param periodUs the cgroup period value.
+ * @return the calculated processor quota, -1 if no quota or period set.
+ */
+ public static double computeProcessorQuota(long quotaUs, long periodUs)
+ {
+ return quotaUs < 0 || periodUs == 0
+ ? -1
+ : (double) quotaUs / periodUs;
+ }
}
diff --git
a/core/src/test/java/org/apache/druid/java/util/metrics/CgroupCpuMonitorTest.java
b/core/src/test/java/org/apache/druid/java/util/metrics/CgroupCpuMonitorTest.java
index 4a05f5f..67c03d2 100644
---
a/core/src/test/java/org/apache/druid/java/util/metrics/CgroupCpuMonitorTest.java
+++
b/core/src/test/java/org/apache/druid/java/util/metrics/CgroupCpuMonitorTest.java
@@ -79,4 +79,14 @@ public class CgroupCpuMonitorTest
Assert.assertEquals("cgroup/cpu/cores_quota", coresEvent.get("metric"));
Assert.assertEquals(3.0D, coresEvent.get("value"));
}
+
+ @Test
+ public void testQuotaCompute()
+ {
+ Assert.assertEquals(-1, CgroupCpuMonitor.computeProcessorQuota(-1,
100000), 0);
+ Assert.assertEquals(0, CgroupCpuMonitor.computeProcessorQuota(0, 100000),
0);
+ Assert.assertEquals(-1, CgroupCpuMonitor.computeProcessorQuota(100000, 0),
0);
+ Assert.assertEquals(2.0D, CgroupCpuMonitor.computeProcessorQuota(200000,
100000), 0);
+ Assert.assertEquals(0.5D, CgroupCpuMonitor.computeProcessorQuota(50000,
100000), 0);
+ }
}
diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index 1d20029..c20d801 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -362,12 +362,15 @@ The following monitors are available:
|----|-----------|
|`org.apache.druid.client.cache.CacheMonitor`|Emits metrics (to logs) about
the segment results cache for Historical and Broker processes. Reports typical
cache statistics include hits, misses, rates, and size (bytes and number of
entries), as well as timeouts and and errors.|
|`org.apache.druid.java.util.metrics.SysMonitor`|Reports on various system
activities and statuses using the [SIGAR
library](https://github.com/hyperic/sigar). Requires execute privileges on
files in `java.io.tmpdir`. Do not set `java.io.tmpdir` to `noexec` when using
`SysMonitor`.|
-|`org.apache.druid.server.metrics.HistoricalMetricsMonitor`|Reports statistics
on Historical processes. Available only on Historical processes.|
|`org.apache.druid.java.util.metrics.JvmMonitor`|Reports various JVM-related
statistics.|
|`org.apache.druid.java.util.metrics.JvmCpuMonitor`|Reports statistics of CPU
consumption by the JVM.|
|`org.apache.druid.java.util.metrics.CpuAcctDeltaMonitor`|Reports consumed CPU
as per the cpuacct cgroup.|
|`org.apache.druid.java.util.metrics.JvmThreadsMonitor`|Reports Thread
statistics in the JVM, like numbers of total, daemon, started, died threads.|
+|`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and
quotas as per the `cpu` cgroup.|
+|`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT
and memory node allocations as per the `cpuset` cgroup.|
+|`org.apache.druid.java.util.metrics.CgroupMemoryMonitor`|Reports memory
statistic as per the memory cgroup.|
|`org.apache.druid.server.metrics.EventReceiverFirehoseMonitor`|Reports how
many events have been queued in the EventReceiverFirehose.|
+|`org.apache.druid.server.metrics.HistoricalMetricsMonitor`|Reports statistics
on Historical processes. Available only on Historical processes.|
|`org.apache.druid.server.metrics.QueryCountStatsMonitor`|Reports how many
queries have been successful/failed/interrupted.|
|`org.apache.druid.server.emitter.HttpEmittingMonitor`|Reports internal
metrics of `http` or `parametrized` emitter (see below). Must not be used with
another emitter type. See the description of the metrics here:
https://github.com/apache/druid/pull/4973.|
|`org.apache.druid.server.metrics.TaskCountStatsMonitor`|Reports how many
ingestion tasks are currently running/pending/waiting and also the number of
successful/failed tasks per emission period.|
diff --git a/docs/operations/metrics.md b/docs/operations/metrics.md
index 2d863d4..e987c1c 100644
--- a/docs/operations/metrics.md
+++ b/docs/operations/metrics.md
@@ -325,8 +325,8 @@ These metrics are only available if the SysMonitor module
is included.
|`sys/swap/pageOut`|Paged out swap.||Varies.|
|`sys/disk/write/count`|Writes to disk.|fsDevName, fsDirName, fsTypeName,
fsSysTypeName, fsOptions.|Varies.|
|`sys/disk/read/count`|Reads from disk.|fsDevName, fsDirName, fsTypeName,
fsSysTypeName, fsOptions.|Varies.|
-|`sys/disk/write/size`|Bytes written to disk. Can we used to determine how
much paging is occurring with regards to segments.|fsDevName, fsDirName,
fsTypeName, fsSysTypeName, fsOptions.|Varies.|
-|`sys/disk/read/size`|Bytes read from disk. Can we used to determine how much
paging is occurring with regards to segments.|fsDevName, fsDirName, fsTypeName,
fsSysTypeName, fsOptions.|Varies.|
+|`sys/disk/write/size`|Bytes written to disk. One indicator of the amount of
paging occurring for segments.|`fsDevName`,`fsDirName`,`fsTypeName`,
`fsSysTypeName`, `fsOptions`.|Varies.|
+|`sys/disk/read/size`|Bytes read from disk. One indicator of the amount of
paging occurring for segments.|`fsDevName`,`fsDirName`, `fsTypeName`,
`fsSysTypeName`, `fsOptions`.|Varies.|
|`sys/net/write/size`|Bytes written to the network.|netName, netAddress,
netHwaddr|Varies.|
|`sys/net/read/size`|Bytes read from the network.|netName, netAddress,
netHwaddr|Varies.|
|`sys/fs/used`|Filesystem bytes used.|fsDevName, fsDirName, fsTypeName,
fsSysTypeName, fsOptions.|< max|
@@ -336,3 +336,17 @@ These metrics are only available if the SysMonitor module
is included.
|`sys/storage/used`|Disk space used.|fsDirName.|Varies.|
|`sys/cpu`|CPU used.|cpuName, cpuTime.|Varies.|
+## Cgroup
+
+These metrics are available on operating systems with the cgroup kernel
feature. All the values are derived by reading from `/sys/fs/cgroup`.
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`cgroup/cpu/shares`|Relative value of CPU time available to this process.
Read from `cpu.shares`.||Varies.|
+|`cgroup/cpu/cores_quota`|Number of cores available to this process. Derived
from `cpu.cfs_quota_us`/`cpu.cfs_period_us`.||Varies. A value of -1 indicates
there is no explicit quota set.|
+|`cgroup/memory/*`|Memory stats for this process (e.g. `cache`, `total_swap`,
etc.). Each stat produces a separate metric. Read from `memory.stat`.||Varies.|
+|`cgroup/memory_numa/*/pages`|Memory stats, per NUMA node, for this process
(e.g. `total`, `unevictable`, etc.). Each stat produces a separate metric. Read
from `memory.num_stat`.|`numaZone`|Varies.|
+|`cgroup/cpuset/cpu_count`|Total number of CPUs available to the process.
Derived from `cpuset.cpus`.||Varies.|
+|`cgroup/cpuset/effective_cpu_count`|Total number of active CPUs available to
the process. Derived from `cpuset.effective_cpus`.||Varies.|
+|`cgroup/cpuset/mems_count`|Total number of memory nodes available to the
process. Derived from `cpuset.mems`.||Varies.|
+|`cgroup/cpuset/effective_mems_count`|Total number of active memory nodes
available to the process. Derived from `cpuset.effective_mems`.||Varies.|
diff --git a/website/.spelling b/website/.spelling
index 3f7690f..705cc77 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -1713,6 +1713,7 @@ LoggingEmitter
Los_Angeles
MDC
NoopServiceEmitter
+NUMA
ONLY_EVENTS
P1D
P1W
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]