Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package golang-github-vpenso-prometheus_slurm_exporter for openSUSE:Factory checked in at 2021-07-23 23:41:06 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/golang-github-vpenso-prometheus_slurm_exporter (Old) and /work/SRC/openSUSE:Factory/.golang-github-vpenso-prometheus_slurm_exporter.new.1899 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "golang-github-vpenso-prometheus_slurm_exporter" Fri Jul 23 23:41:06 2021 rev:4 rq:907869 version:0.19 Changes: -------- --- /work/SRC/openSUSE:Factory/golang-github-vpenso-prometheus_slurm_exporter/golang-github-vpenso-prometheus_slurm_exporter.changes 2021-03-18 22:55:39.459577319 +0100 +++ /work/SRC/openSUSE:Factory/.golang-github-vpenso-prometheus_slurm_exporter.new.1899/golang-github-vpenso-prometheus_slurm_exporter.changes 2021-07-23 23:41:19.309822893 +0200 @@ -1,0 +2,11 @@ +Thu Jul 22 13:23:12 UTC 2021 - Egbert Eich <e...@suse.com> + +- Update to version 0.19 + * GPUs accounting has to be activated explicitly via cmd line option. + * Export detailed usage info for every node (CPU, Memory). + NOTE: With the present version of Slurm (20.11), GPU accounting + in the prometheus-slurm-exporter will cause the exporter to + terminate, thus it must not be enabled for the time being. +- Do not ship sources. + +------------------------------------------------------------------- Old: ---- golang-github-vpenso-prometheus_slurm_exporter-0.17.tar.gz New: ---- golang-github-vpenso-prometheus_slurm_exporter-0.19.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ golang-github-vpenso-prometheus_slurm_exporter.spec ++++++ --- /var/tmp/diff_new_pack.Ial5zc/_old 2021-07-23 23:41:20.393821509 +0200 +++ /var/tmp/diff_new_pack.Ial5zc/_new 2021-07-23 23:41:20.397821503 +0200 @@ -19,7 +19,7 @@ %{go_nostrip} Name: golang-github-vpenso-prometheus_slurm_exporter -Version: 0.17 +Version: 0.19 Release: 0 Summary: Prometheus exporter for Slurm metrics License: GPL-3.0-or-later @@ -50,7 +50,7 @@ %install %{goinstall} -%{gosrc} +# No %%{gosrc} install -D -m 0644 lib/systemd/prometheus-slurm-exporter.service %{buildroot}%{_unitdir}/prometheus-slurm_exporter.service mv %{buildroot}%{_bindir}/ %{buildroot}%{_sbindir}/ ln -s %{_sbindir}/service %{buildroot}%{_sbindir}/rcprometheus-slurm_exporter ++++++ golang-github-vpenso-prometheus_slurm_exporter-0.17.tar.gz -> golang-github-vpenso-prometheus_slurm_exporter-0.19.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/Makefile new/prometheus-slurm-exporter-0.19/Makefile --- old/prometheus-slurm-exporter-0.17/Makefile 2021-02-04 12:33:20.000000000 +0100 +++ new/prometheus-slurm-exporter-0.19/Makefile 2021-04-16 16:24:08.000000000 +0200 @@ -2,7 +2,7 @@ ifndef GOPATH GOPATH=$(shell pwd):/usr/share/gocode endif -GOFILES=accounts.go cpus.go gpus.go main.go nodes.go partitions.go queue.go scheduler.go sshare.go users.go +GOFILES=accounts.go cpus.go gpus.go main.go node.go nodes.go partitions.go queue.go scheduler.go sshare.go users.go GOBIN=bin/$(PROJECT_NAME) build: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/README.md new/prometheus-slurm-exporter-0.19/README.md --- old/prometheus-slurm-exporter-0.17/README.md 2021-02-04 12:33:20.000000000 +0100 +++ new/prometheus-slurm-exporter-0.19/README.md 2021-04-16 16:24:08.000000000 +0200 @@ -24,6 +24,13 @@ - Information extracted from the SLURM [**sinfo**](https://slurm.schedmd.com/sinfo.html) and [**sacct**](https://slurm.schedmd.com/sacct.html) command. - [Slurm GRES scheduling](https://slurm.schedmd.com/gres.html) +**NOTE**: since version **0.19**, GPU accounting has to be **explicitly** enabled adding the _-gpu-acct_ option to the command line otherwise it will not be activated. + +Be aware that: + +* According to issue #38, users reported that newer version of Slurm provides slightly different output and thus GPUs accounting may not work properly. +* Users who do not have GPUs and/or do not have accounting activated may want to keep GPUs accounting **off** (see issue #45). + ### State of the Nodes * **Allocated**: nodes which has been allocated to one or more jobs. @@ -41,6 +48,16 @@ - Information extracted from the SLURM [**sinfo**](https://slurm.schedmd.com/sinfo.html) command. +#### Additional info about node usage + +Since version **0.18**, the following information are also extracted and exported for **every** node known by Slurm: + +* CPUs: how many are _allocated_, _idle_, _other_ and in _total_. +* Memory: _allocated_ and in _total_. +* Labels: hostname and its Slurm status (e.g. _idle_, _mix_, _allocated_, _draining_, etc.). + +See the related [test data](https://github.com/vpenso/prometheus-slurm-exporter/blob/master/test_data/sinfo_mem.txt) to check the format of the information extracted from Slurm. + ### Status of the Jobs * **PENDING**: Jobs awaiting for resource allocation. @@ -93,6 +110,9 @@ * the database is either down or unreachable; * the status of the Slurm accounting DB may be inconsistent (e.g. ``sreport`` missing data, weird utilization of the cluster, etc.). +### Share Information + +Collect _share_ statistics for every Slurm account. Refer to the [manpage of the sshare command](https://slurm.schedmd.com/sshare.html) to get more information. ## Installation diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/main.go new/prometheus-slurm-exporter-0.19/main.go --- old/prometheus-slurm-exporter-0.17/main.go 2021-02-04 12:33:20.000000000 +0100 +++ new/prometheus-slurm-exporter-0.19/main.go 2021-04-16 16:24:08.000000000 +0200 @@ -27,8 +27,8 @@ // Metrics have to be registered to be exposed prometheus.MustRegister(NewAccountsCollector()) // from accounts.go prometheus.MustRegister(NewCPUsCollector()) // from cpus.go - prometheus.MustRegister(NewGPUsCollector()) // from gpus.go prometheus.MustRegister(NewNodesCollector()) // from nodes.go + prometheus.MustRegister(NewNodeCollector()) // from node.go prometheus.MustRegister(NewPartitionsCollector()) // from partitions.go prometheus.MustRegister(NewQueueCollector()) // from queue.go prometheus.MustRegister(NewSchedulerCollector()) // from scheduler.go @@ -41,11 +41,23 @@ ":8080", "The address to listen on for HTTP requests.") +var gpuAcct = flag.Bool( + "gpus-acct", + false, + "Enable GPUs accounting") + func main() { flag.Parse() + + // Turn on GPUs accounting only if the corresponding command line option is set to true. + if *gpuAcct { + prometheus.MustRegister(NewGPUsCollector()) // from gpus.go + } + // The Handler function provides a default handler to expose metrics // via an HTTP server. "/metrics" is the usual endpoint for that. log.Infof("Starting Server: %s", *listenAddress) + log.Infof("GPUs Accounting: %t", *gpuAcct) http.Handle("/metrics", promhttp.Handler()) log.Fatal(http.ListenAndServe(*listenAddress, nil)) } diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/node.go new/prometheus-slurm-exporter-0.19/node.go --- old/prometheus-slurm-exporter-0.17/node.go 1970-01-01 01:00:00.000000000 +0100 +++ new/prometheus-slurm-exporter-0.19/node.go 2021-04-16 16:24:08.000000000 +0200 @@ -0,0 +1,137 @@ +/* Copyright 2021 Chris Read + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program. If not, see <http://www.gnu.org/licenses/>. */ + +package main + +import ( + "log" + "os/exec" + "sort" + "strconv" + "strings" + + "github.com/prometheus/client_golang/prometheus" +) + +// NodeMetrics stores metrics for each node +type NodeMetrics struct { + memAlloc uint64 + memTotal uint64 + cpuAlloc uint64 + cpuIdle uint64 + cpuOther uint64 + cpuTotal uint64 + nodeStatus string +} + +func NodeGetMetrics() map[string]*NodeMetrics { + return ParseNodeMetrics(NodeData()) +} + +// ParseNodeMetrics takes the output of sinfo with node data +// It returns a map of metrics per node +func ParseNodeMetrics(input []byte) map[string]*NodeMetrics { + nodes := make(map[string]*NodeMetrics) + lines := strings.Split(string(input), "\n") + + // Sort and remove all the duplicates from the 'sinfo' output + sort.Strings(lines) + linesUniq := RemoveDuplicates(lines) + + for _, line := range linesUniq { + node := strings.Fields(line) + nodeName := node[0] + nodeStatus := node[4] // mixed, allocated, etc. + + nodes[nodeName] = &NodeMetrics{0, 0, 0, 0, 0, 0, ""} + + memAlloc, _ := strconv.ParseUint(node[1], 10, 64) + memTotal, _ := strconv.ParseUint(node[2], 10, 64) + + + cpuInfo := strings.Split(node[3], "/") + cpuAlloc, _ := strconv.ParseUint(cpuInfo[0], 10, 64) + cpuIdle, _ := strconv.ParseUint(cpuInfo[1], 10, 64) + cpuOther, _ := strconv.ParseUint(cpuInfo[2], 10, 64) + cpuTotal, _ := strconv.ParseUint(cpuInfo[3], 10, 64) + + nodes[nodeName].memAlloc = memAlloc + nodes[nodeName].memTotal = memTotal + nodes[nodeName].cpuAlloc = cpuAlloc + nodes[nodeName].cpuIdle = cpuIdle + nodes[nodeName].cpuOther = cpuOther + nodes[nodeName].cpuTotal = cpuTotal + nodes[nodeName].nodeStatus = nodeStatus + } + + return nodes +} + +// NodeData executes the sinfo command to get data for each node +// It returns the output of the sinfo command +func NodeData() []byte { + cmd := exec.Command("sinfo", "-h", "-N", "-O", "NodeList,AllocMem,Memory,CPUsState,StateLong") + out, err := cmd.Output() + if err != nil { + log.Fatal(err) + } + return out +} + +type NodeCollector struct { + cpuAlloc *prometheus.Desc + cpuIdle *prometheus.Desc + cpuOther *prometheus.Desc + cpuTotal *prometheus.Desc + memAlloc *prometheus.Desc + memTotal *prometheus.Desc +} + +// NewNodeCollector creates a Prometheus collector to keep all our stats in +// It returns a set of collections for consumption +func NewNodeCollector() *NodeCollector { + labels := []string{"node","status"} + + return &NodeCollector{ + cpuAlloc: prometheus.NewDesc("slurm_node_cpu_alloc", "Allocated CPUs per node", labels, nil), + cpuIdle: prometheus.NewDesc("slurm_node_cpu_idle", "Idle CPUs per node", labels, nil), + cpuOther: prometheus.NewDesc("slurm_node_cpu_other", "Other CPUs per node", labels, nil), + cpuTotal: prometheus.NewDesc("slurm_node_cpu_total", "Total CPUs per node", labels, nil), + memAlloc: prometheus.NewDesc("slurm_node_mem_alloc", "Allocated memory per node", labels, nil), + memTotal: prometheus.NewDesc("slurm_node_mem_total", "Total memory per node", labels, nil), + } +} + +// Send all metric descriptions +func (nc *NodeCollector) Describe(ch chan<- *prometheus.Desc) { + ch <- nc.cpuAlloc + ch <- nc.cpuIdle + ch <- nc.cpuOther + ch <- nc.cpuTotal + ch <- nc.memAlloc + ch <- nc.memTotal +} + +func (nc *NodeCollector) Collect(ch chan<- prometheus.Metric) { + nodes := NodeGetMetrics() + for node := range nodes { + ch <- prometheus.MustNewConstMetric(nc.cpuAlloc, prometheus.GaugeValue, float64(nodes[node].cpuAlloc), node, nodes[node].nodeStatus) + ch <- prometheus.MustNewConstMetric(nc.cpuIdle, prometheus.GaugeValue, float64(nodes[node].cpuIdle), node, nodes[node].nodeStatus) + ch <- prometheus.MustNewConstMetric(nc.cpuOther, prometheus.GaugeValue, float64(nodes[node].cpuOther), node, nodes[node].nodeStatus) + ch <- prometheus.MustNewConstMetric(nc.cpuTotal, prometheus.GaugeValue, float64(nodes[node].cpuTotal), node, nodes[node].nodeStatus) + ch <- prometheus.MustNewConstMetric(nc.memAlloc, prometheus.GaugeValue, float64(nodes[node].memAlloc), node, nodes[node].nodeStatus) + ch <- prometheus.MustNewConstMetric(nc.memTotal, prometheus.GaugeValue, float64(nodes[node].memTotal), node, nodes[node].nodeStatus) + } +} diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/node_test.go new/prometheus-slurm-exporter-0.19/node_test.go --- old/prometheus-slurm-exporter-0.17/node_test.go 1970-01-01 01:00:00.000000000 +0100 +++ new/prometheus-slurm-exporter-0.19/node_test.go 2021-04-16 16:24:08.000000000 +0200 @@ -0,0 +1,57 @@ +/* Copyright 2021 Chris Read + +This program is free software: you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program. If not, see <http://www.gnu.org/licenses/>. */ + +package main + +import ( + "io/ioutil" + "testing" + + "github.com/stretchr/testify/assert" +) + +/* +For this example data line: + +a048,79384,193000,3/13/0/16,mix + +We want output that looks like: + +slurm_node_cpus_allocated{name="a048",status="mix"} 3 +slurm_node_cpus_idle{name="a048",status="mix"} 3 +slurm_node_cpus_other{name="a048",status="mix"} 0 +slurm_node_cpus_total{name="a048",status="mix"} 16 +slurm_node_mem_allocated{name="a048",status="mix"} 179384 +slurm_node_mem_total{name="a048",status="mix"} 193000 + +*/ + +func TestNodeMetrics(t *testing.T) { + // Read the input data from a file + data, err := ioutil.ReadFile("test_data/sinfo_mem.txt") + if err != nil { + t.Fatalf("Can not open test data: %v", err) + } + metrics := ParseNodeMetrics(data) + t.Logf("%+v", metrics) + + assert.Contains(t, metrics, "b001") + assert.Equal(t, uint64(327680), metrics["b001"].memAlloc) + assert.Equal(t, uint64(386000), metrics["b001"].memTotal) + assert.Equal(t, uint64(32), metrics["b001"].cpuAlloc) + assert.Equal(t, uint64(0), metrics["b001"].cpuIdle) + assert.Equal(t, uint64(0), metrics["b001"].cpuOther) + assert.Equal(t, uint64(32), metrics["b001"].cpuTotal) +} diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/nodes.go new/prometheus-slurm-exporter-0.19/nodes.go --- old/prometheus-slurm-exporter-0.17/nodes.go 2021-02-04 12:33:20.000000000 +0100 +++ new/prometheus-slurm-exporter-0.19/nodes.go 2021-04-16 16:24:08.000000000 +0200 @@ -50,8 +50,10 @@ // Walk through the slice 's' and for each value we haven't seen so far, append it to 't'. for _, v := range s { if _, seen := m[v]; !seen { - t = append(t, v) - m[v] = true + if len(v) > 0 { + t = append(t, v) + m[v] = true + } } } diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/prometheus-slurm-exporter-0.17/test_data/sinfo_mem.txt new/prometheus-slurm-exporter-0.19/test_data/sinfo_mem.txt --- old/prometheus-slurm-exporter-0.17/test_data/sinfo_mem.txt 1970-01-01 01:00:00.000000000 +0100 +++ new/prometheus-slurm-exporter-0.19/test_data/sinfo_mem.txt 2021-04-16 16:24:08.000000000 +0200 @@ -0,0 +1,21 @@ +a048 163840 193000 16/0/0/16 mixed +a048 163840 193000 16/0/0/16 mixed +a048 163840 193000 16/0/0/16 idle +a048 163840 193000 16/0/0/16 idle +a049 163840 193000 16/0/0/16 idle +a049 163840 193000 16/0/0/16 idle +a049 163840 193000 16/0/0/16 idle +a049 163840 193000 16/0/0/16 idle +a050 163840 193000 16/0/0/16 idle +a050 163840 193000 16/0/0/16 idle +a050 163840 193000 16/0/0/16 idle +a051 163840 193000 16/0/0/16 idle +a051 163840 193000 16/0/0/16 idle +a051 163840 193000 16/0/0/16 idle +a052 0 193000 0/16/0/16 idle +b001 327680 386000 32/0/0/32 down +b001 327680 386000 32/0/0/32 down +b002 327680 386000 32/0/0/32 down +b002 327680 386000 32/0/0/32 idle +b003 296960 386000 29/3/0/32 down +b003 296960 386000 29/3/0/32 idle