Script 'mail_helper' called by obssrc
Hello community,
here is the log from the commit of package
golang-github-vpenso-prometheus_slurm_exporter for openSUSE:Factory checked in
at 2021-07-23 23:41:06
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing
/work/SRC/openSUSE:Factory/golang-github-vpenso-prometheus_slurm_exporter (Old)
and
/work/SRC/openSUSE:Factory/.golang-github-vpenso-prometheus_slurm_exporter.new.1899
(New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "golang-github-vpenso-prometheus_slurm_exporter"
Fri Jul 23 23:41:06 2021 rev:4 rq:907869 version:0.19
Changes:
--------
---
/work/SRC/openSUSE:Factory/golang-github-vpenso-prometheus_slurm_exporter/golang-github-vpenso-prometheus_slurm_exporter.changes
2021-03-18 22:55:39.459577319 +0100
+++
/work/SRC/openSUSE:Factory/.golang-github-vpenso-prometheus_slurm_exporter.new.1899/golang-github-vpenso-prometheus_slurm_exporter.changes
2021-07-23 23:41:19.309822893 +0200
@@ -1,0 +2,11 @@
+Thu Jul 22 13:23:12 UTC 2021 - Egbert Eich <[email protected]>
+
+- Update to version 0.19
+ * GPUs accounting has to be activated explicitly via cmd line option.
+ * Export detailed usage info for every node (CPU, Memory).
+ NOTE: With the present version of Slurm (20.11), GPU accounting
+ in the prometheus-slurm-exporter will cause the exporter to
+ terminate, thus it must not be enabled for the time being.
+- Do not ship sources.
+
+-------------------------------------------------------------------
Old:
----
golang-github-vpenso-prometheus_slurm_exporter-0.17.tar.gz
New:
----
golang-github-vpenso-prometheus_slurm_exporter-0.19.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ golang-github-vpenso-prometheus_slurm_exporter.spec ++++++
--- /var/tmp/diff_new_pack.Ial5zc/_old 2021-07-23 23:41:20.393821509 +0200
+++ /var/tmp/diff_new_pack.Ial5zc/_new 2021-07-23 23:41:20.397821503 +0200
@@ -19,7 +19,7 @@
%{go_nostrip}
Name: golang-github-vpenso-prometheus_slurm_exporter
-Version: 0.17
+Version: 0.19
Release: 0
Summary: Prometheus exporter for Slurm metrics
License: GPL-3.0-or-later
@@ -50,7 +50,7 @@
%install
%{goinstall}
-%{gosrc}
+# No %%{gosrc}
install -D -m 0644 lib/systemd/prometheus-slurm-exporter.service
%{buildroot}%{_unitdir}/prometheus-slurm_exporter.service
mv %{buildroot}%{_bindir}/ %{buildroot}%{_sbindir}/
ln -s %{_sbindir}/service %{buildroot}%{_sbindir}/rcprometheus-slurm_exporter
++++++ golang-github-vpenso-prometheus_slurm_exporter-0.17.tar.gz ->
golang-github-vpenso-prometheus_slurm_exporter-0.19.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/Makefile
new/prometheus-slurm-exporter-0.19/Makefile
--- old/prometheus-slurm-exporter-0.17/Makefile 2021-02-04 12:33:20.000000000
+0100
+++ new/prometheus-slurm-exporter-0.19/Makefile 2021-04-16 16:24:08.000000000
+0200
@@ -2,7 +2,7 @@
ifndef GOPATH
GOPATH=$(shell pwd):/usr/share/gocode
endif
-GOFILES=accounts.go cpus.go gpus.go main.go nodes.go partitions.go queue.go
scheduler.go sshare.go users.go
+GOFILES=accounts.go cpus.go gpus.go main.go node.go nodes.go partitions.go
queue.go scheduler.go sshare.go users.go
GOBIN=bin/$(PROJECT_NAME)
build:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/README.md
new/prometheus-slurm-exporter-0.19/README.md
--- old/prometheus-slurm-exporter-0.17/README.md 2021-02-04
12:33:20.000000000 +0100
+++ new/prometheus-slurm-exporter-0.19/README.md 2021-04-16
16:24:08.000000000 +0200
@@ -24,6 +24,13 @@
- Information extracted from the SLURM
[**sinfo**](https://slurm.schedmd.com/sinfo.html) and
[**sacct**](https://slurm.schedmd.com/sacct.html) command.
- [Slurm GRES scheduling](https://slurm.schedmd.com/gres.html)
+**NOTE**: since version **0.19**, GPU accounting has to be **explicitly**
enabled adding the _-gpu-acct_ option to the command line otherwise it will not
be activated.
+
+Be aware that:
+
+* According to issue #38, users reported that newer version of Slurm provides
slightly different output and thus GPUs accounting may not work properly.
+* Users who do not have GPUs and/or do not have accounting activated may want
to keep GPUs accounting **off** (see issue #45).
+
### State of the Nodes
* **Allocated**: nodes which has been allocated to one or more jobs.
@@ -41,6 +48,16 @@
- Information extracted from the SLURM
[**sinfo**](https://slurm.schedmd.com/sinfo.html) command.
+#### Additional info about node usage
+
+Since version **0.18**, the following information are also extracted and
exported for **every** node known by Slurm:
+
+* CPUs: how many are _allocated_, _idle_, _other_ and in _total_.
+* Memory: _allocated_ and in _total_.
+* Labels: hostname and its Slurm status (e.g. _idle_, _mix_, _allocated_,
_draining_, etc.).
+
+See the related [test
data](https://github.com/vpenso/prometheus-slurm-exporter/blob/master/test_data/sinfo_mem.txt)
to check the format of the information extracted from Slurm.
+
### Status of the Jobs
* **PENDING**: Jobs awaiting for resource allocation.
@@ -93,6 +110,9 @@
* the database is either down or unreachable;
* the status of the Slurm accounting DB may be inconsistent (e.g. ``sreport``
missing data, weird utilization of the cluster, etc.).
+### Share Information
+
+Collect _share_ statistics for every Slurm account. Refer to the [manpage of
the sshare command](https://slurm.schedmd.com/sshare.html) to get more
information.
## Installation
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/main.go
new/prometheus-slurm-exporter-0.19/main.go
--- old/prometheus-slurm-exporter-0.17/main.go 2021-02-04 12:33:20.000000000
+0100
+++ new/prometheus-slurm-exporter-0.19/main.go 2021-04-16 16:24:08.000000000
+0200
@@ -27,8 +27,8 @@
// Metrics have to be registered to be exposed
prometheus.MustRegister(NewAccountsCollector()) // from
accounts.go
prometheus.MustRegister(NewCPUsCollector()) // from cpus.go
- prometheus.MustRegister(NewGPUsCollector()) // from gpus.go
prometheus.MustRegister(NewNodesCollector()) // from nodes.go
+ prometheus.MustRegister(NewNodeCollector()) // from node.go
prometheus.MustRegister(NewPartitionsCollector()) // from
partitions.go
prometheus.MustRegister(NewQueueCollector()) // from queue.go
prometheus.MustRegister(NewSchedulerCollector()) // from
scheduler.go
@@ -41,11 +41,23 @@
":8080",
"The address to listen on for HTTP requests.")
+var gpuAcct = flag.Bool(
+ "gpus-acct",
+ false,
+ "Enable GPUs accounting")
+
func main() {
flag.Parse()
+
+ // Turn on GPUs accounting only if the corresponding command line
option is set to true.
+ if *gpuAcct {
+ prometheus.MustRegister(NewGPUsCollector()) // from gpus.go
+ }
+
// The Handler function provides a default handler to expose metrics
// via an HTTP server. "/metrics" is the usual endpoint for that.
log.Infof("Starting Server: %s", *listenAddress)
+ log.Infof("GPUs Accounting: %t", *gpuAcct)
http.Handle("/metrics", promhttp.Handler())
log.Fatal(http.ListenAndServe(*listenAddress, nil))
}
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/node.go
new/prometheus-slurm-exporter-0.19/node.go
--- old/prometheus-slurm-exporter-0.17/node.go 1970-01-01 01:00:00.000000000
+0100
+++ new/prometheus-slurm-exporter-0.19/node.go 2021-04-16 16:24:08.000000000
+0200
@@ -0,0 +1,137 @@
+/* Copyright 2021 Chris Read
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+package main
+
+import (
+ "log"
+ "os/exec"
+ "sort"
+ "strconv"
+ "strings"
+
+ "github.com/prometheus/client_golang/prometheus"
+)
+
+// NodeMetrics stores metrics for each node
+type NodeMetrics struct {
+ memAlloc uint64
+ memTotal uint64
+ cpuAlloc uint64
+ cpuIdle uint64
+ cpuOther uint64
+ cpuTotal uint64
+ nodeStatus string
+}
+
+func NodeGetMetrics() map[string]*NodeMetrics {
+ return ParseNodeMetrics(NodeData())
+}
+
+// ParseNodeMetrics takes the output of sinfo with node data
+// It returns a map of metrics per node
+func ParseNodeMetrics(input []byte) map[string]*NodeMetrics {
+ nodes := make(map[string]*NodeMetrics)
+ lines := strings.Split(string(input), "\n")
+
+ // Sort and remove all the duplicates from the 'sinfo' output
+ sort.Strings(lines)
+ linesUniq := RemoveDuplicates(lines)
+
+ for _, line := range linesUniq {
+ node := strings.Fields(line)
+ nodeName := node[0]
+ nodeStatus := node[4] // mixed, allocated, etc.
+
+ nodes[nodeName] = &NodeMetrics{0, 0, 0, 0, 0, 0, ""}
+
+ memAlloc, _ := strconv.ParseUint(node[1], 10, 64)
+ memTotal, _ := strconv.ParseUint(node[2], 10, 64)
+
+
+ cpuInfo := strings.Split(node[3], "/")
+ cpuAlloc, _ := strconv.ParseUint(cpuInfo[0], 10, 64)
+ cpuIdle, _ := strconv.ParseUint(cpuInfo[1], 10, 64)
+ cpuOther, _ := strconv.ParseUint(cpuInfo[2], 10, 64)
+ cpuTotal, _ := strconv.ParseUint(cpuInfo[3], 10, 64)
+
+ nodes[nodeName].memAlloc = memAlloc
+ nodes[nodeName].memTotal = memTotal
+ nodes[nodeName].cpuAlloc = cpuAlloc
+ nodes[nodeName].cpuIdle = cpuIdle
+ nodes[nodeName].cpuOther = cpuOther
+ nodes[nodeName].cpuTotal = cpuTotal
+ nodes[nodeName].nodeStatus = nodeStatus
+ }
+
+ return nodes
+}
+
+// NodeData executes the sinfo command to get data for each node
+// It returns the output of the sinfo command
+func NodeData() []byte {
+ cmd := exec.Command("sinfo", "-h", "-N", "-O",
"NodeList,AllocMem,Memory,CPUsState,StateLong")
+ out, err := cmd.Output()
+ if err != nil {
+ log.Fatal(err)
+ }
+ return out
+}
+
+type NodeCollector struct {
+ cpuAlloc *prometheus.Desc
+ cpuIdle *prometheus.Desc
+ cpuOther *prometheus.Desc
+ cpuTotal *prometheus.Desc
+ memAlloc *prometheus.Desc
+ memTotal *prometheus.Desc
+}
+
+// NewNodeCollector creates a Prometheus collector to keep all our stats in
+// It returns a set of collections for consumption
+func NewNodeCollector() *NodeCollector {
+ labels := []string{"node","status"}
+
+ return &NodeCollector{
+ cpuAlloc: prometheus.NewDesc("slurm_node_cpu_alloc", "Allocated
CPUs per node", labels, nil),
+ cpuIdle: prometheus.NewDesc("slurm_node_cpu_idle", "Idle CPUs
per node", labels, nil),
+ cpuOther: prometheus.NewDesc("slurm_node_cpu_other", "Other
CPUs per node", labels, nil),
+ cpuTotal: prometheus.NewDesc("slurm_node_cpu_total", "Total
CPUs per node", labels, nil),
+ memAlloc: prometheus.NewDesc("slurm_node_mem_alloc", "Allocated
memory per node", labels, nil),
+ memTotal: prometheus.NewDesc("slurm_node_mem_total", "Total
memory per node", labels, nil),
+ }
+}
+
+// Send all metric descriptions
+func (nc *NodeCollector) Describe(ch chan<- *prometheus.Desc) {
+ ch <- nc.cpuAlloc
+ ch <- nc.cpuIdle
+ ch <- nc.cpuOther
+ ch <- nc.cpuTotal
+ ch <- nc.memAlloc
+ ch <- nc.memTotal
+}
+
+func (nc *NodeCollector) Collect(ch chan<- prometheus.Metric) {
+ nodes := NodeGetMetrics()
+ for node := range nodes {
+ ch <- prometheus.MustNewConstMetric(nc.cpuAlloc,
prometheus.GaugeValue, float64(nodes[node].cpuAlloc), node,
nodes[node].nodeStatus)
+ ch <- prometheus.MustNewConstMetric(nc.cpuIdle,
prometheus.GaugeValue, float64(nodes[node].cpuIdle), node,
nodes[node].nodeStatus)
+ ch <- prometheus.MustNewConstMetric(nc.cpuOther,
prometheus.GaugeValue, float64(nodes[node].cpuOther), node,
nodes[node].nodeStatus)
+ ch <- prometheus.MustNewConstMetric(nc.cpuTotal,
prometheus.GaugeValue, float64(nodes[node].cpuTotal), node,
nodes[node].nodeStatus)
+ ch <- prometheus.MustNewConstMetric(nc.memAlloc,
prometheus.GaugeValue, float64(nodes[node].memAlloc), node,
nodes[node].nodeStatus)
+ ch <- prometheus.MustNewConstMetric(nc.memTotal,
prometheus.GaugeValue, float64(nodes[node].memTotal), node,
nodes[node].nodeStatus)
+ }
+}
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/node_test.go
new/prometheus-slurm-exporter-0.19/node_test.go
--- old/prometheus-slurm-exporter-0.17/node_test.go 1970-01-01
01:00:00.000000000 +0100
+++ new/prometheus-slurm-exporter-0.19/node_test.go 2021-04-16
16:24:08.000000000 +0200
@@ -0,0 +1,57 @@
+/* Copyright 2021 Chris Read
+
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+package main
+
+import (
+ "io/ioutil"
+ "testing"
+
+ "github.com/stretchr/testify/assert"
+)
+
+/*
+For this example data line:
+
+a048,79384,193000,3/13/0/16,mix
+
+We want output that looks like:
+
+slurm_node_cpus_allocated{name="a048",status="mix"} 3
+slurm_node_cpus_idle{name="a048",status="mix"} 3
+slurm_node_cpus_other{name="a048",status="mix"} 0
+slurm_node_cpus_total{name="a048",status="mix"} 16
+slurm_node_mem_allocated{name="a048",status="mix"} 179384
+slurm_node_mem_total{name="a048",status="mix"} 193000
+
+*/
+
+func TestNodeMetrics(t *testing.T) {
+ // Read the input data from a file
+ data, err := ioutil.ReadFile("test_data/sinfo_mem.txt")
+ if err != nil {
+ t.Fatalf("Can not open test data: %v", err)
+ }
+ metrics := ParseNodeMetrics(data)
+ t.Logf("%+v", metrics)
+
+ assert.Contains(t, metrics, "b001")
+ assert.Equal(t, uint64(327680), metrics["b001"].memAlloc)
+ assert.Equal(t, uint64(386000), metrics["b001"].memTotal)
+ assert.Equal(t, uint64(32), metrics["b001"].cpuAlloc)
+ assert.Equal(t, uint64(0), metrics["b001"].cpuIdle)
+ assert.Equal(t, uint64(0), metrics["b001"].cpuOther)
+ assert.Equal(t, uint64(32), metrics["b001"].cpuTotal)
+}
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/nodes.go
new/prometheus-slurm-exporter-0.19/nodes.go
--- old/prometheus-slurm-exporter-0.17/nodes.go 2021-02-04 12:33:20.000000000
+0100
+++ new/prometheus-slurm-exporter-0.19/nodes.go 2021-04-16 16:24:08.000000000
+0200
@@ -50,8 +50,10 @@
// Walk through the slice 's' and for each value we haven't seen so
far, append it to 't'.
for _, v := range s {
if _, seen := m[v]; !seen {
- t = append(t, v)
- m[v] = true
+ if len(v) > 0 {
+ t = append(t, v)
+ m[v] = true
+ }
}
}
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/prometheus-slurm-exporter-0.17/test_data/sinfo_mem.txt
new/prometheus-slurm-exporter-0.19/test_data/sinfo_mem.txt
--- old/prometheus-slurm-exporter-0.17/test_data/sinfo_mem.txt 1970-01-01
01:00:00.000000000 +0100
+++ new/prometheus-slurm-exporter-0.19/test_data/sinfo_mem.txt 2021-04-16
16:24:08.000000000 +0200
@@ -0,0 +1,21 @@
+a048 163840 193000 16/0/0/16 mixed
+a048 163840 193000 16/0/0/16 mixed
+a048 163840 193000 16/0/0/16 idle
+a048 163840 193000 16/0/0/16 idle
+a049 163840 193000 16/0/0/16 idle
+a049 163840 193000 16/0/0/16 idle
+a049 163840 193000 16/0/0/16 idle
+a049 163840 193000 16/0/0/16 idle
+a050 163840 193000 16/0/0/16 idle
+a050 163840 193000 16/0/0/16 idle
+a050 163840 193000 16/0/0/16 idle
+a051 163840 193000 16/0/0/16 idle
+a051 163840 193000 16/0/0/16 idle
+a051 163840 193000 16/0/0/16 idle
+a052 0 193000 0/16/0/16 idle
+b001 327680 386000 32/0/0/32 down
+b001 327680 386000 32/0/0/32 down
+b002 327680 386000 32/0/0/32 down
+b002 327680 386000 32/0/0/32 idle
+b003 296960 386000 29/3/0/32 down
+b003 296960 386000 29/3/0/32 idle