Hello community, here is the log from the commit of package slurm.15119 for openSUSE:Leap:15.1:Update checked in at 2020-11-26 14:49:24 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Leap:15.1:Update/slurm.15119 (Old) and /work/SRC/openSUSE:Leap:15.1:Update/.slurm.15119.new.5913 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "slurm.15119" Thu Nov 26 14:49:24 2020 rev:1 rq:850624 version:18.08.9 Changes: -------- New Changes file: --- /dev/null 2020-11-18 17:46:03.679371574 +0100 +++ /work/SRC/openSUSE:Leap:15.1:Update/.slurm.15119.new.5913/slurm.changes 2020-11-26 14:49:29.533638316 +0100 @@ -0,0 +1,1451 @@ +------------------------------------------------------------------- +Wed Nov 18 10:15:39 UTC 2020 - Ana Guerrero Lopez <[email protected]> + +- PMIx - fix potential buffer overflows from use of unpackmem(). + CVE-2020-27745 (bsc#1178890) + * PMIx-fix-potential-buffer-overflows-from-use-of-unpackmen_CVE-2020-27745.patch +- X11 forwarding - fix potential leak of the magic cookie when sent as an + argument to the xauth command. CVE-2020-27746 (bsc#1178891) + * X11-forwarding-avoid-unsafe-use-of-magic-cookie_CVE-2020-27746.patch +- More information at https://lists.schedmd.com/pipermail/slurm-announce/2020/000045.html + +------------------------------------------------------------------- +Tue Jul 7 08:03:15 UTC 2020 - Egbert Eich <[email protected]> + +- Fix Authentication Bypass when Message Aggregation is enabled CVE-2020-12693 + This fixes and issue where authentication could be bypassed via an alternate + path or channel when message Aggregation was enabled. + A race condition allowed a user to launch a process as an arbitrary user. + Add: Fix-Authentication-Bypass-when-Message-Aggregation-is-enabled-CVE-2020-12693.patch + (CVE-2020-12693, bsc#1172004). +- Remove unneeded build dependency to postgresql-devel. + +------------------------------------------------------------------- +Thu Jan 2 09:14:56 UTC 2020 - Egbert Eich <[email protected]> + +- Deprecate "ControlMachine" only for SLURM version upgrades and + products newer than 1501. This ensures that the original setting + is retained for the SLURM version shipped origianlly with SLE-15-SP1 + or Leap 15.1. + +------------------------------------------------------------------- +Sat Dec 21 09:07:42 UTC 2019 - Egbert Eich <[email protected]> + +- Update to v18.08.9 for fixing CVE-2019-19728 (bsc#1159692). + * Wrap END_TIMER{,2,3} macro definition in "do {} while (0)" block. + * Make sview work with glib2 v2.62. + * Make Slurm compile on linux after sys/sysctl.h was deprecated. + * Install slurmdbd.conf.example with 0600 permissions to encourage secure + use. CVE-2019-19727. + * srun - do not continue with job launch if --uid fails. CVE-2019-19728. + +------------------------------------------------------------------- +Wed Dec 11 18:23:46 UTC 2019 - Christian Goll <[email protected]> + +- added pmix support jsc#SLE-10800 + +------------------------------------------------------------------- +Sun Dec 8 11:33:42 UTC 2019 - Egbert Eich <[email protected]> + +- Use --with-shared-libslurm to build slurm binaries using libslurm. +- Make libslurm depend on slurm-config. + +------------------------------------------------------------------- +Fri Dec 6 17:06:32 UTC 2019 - Egbert Eich <[email protected]> + +- Fix ownership of /var/spool/slurm on new installations + and upgrade (boo#1158696). + +------------------------------------------------------------------- +Thu Oct 31 10:18:21 UTC 2019 - Egbert Eich <[email protected]> + +- Fix permissions of slurmdbd.conf (bsc#1155784, CVE-2019-19727). +- Fix %posttrans macro _res_update to cope with added newline + (bsc#1153259). + +------------------------------------------------------------------- +Mon Oct 21 15:54:43 UTC 2019 - Egbert Eich <[email protected]> + +- Add package slurm-webdoc which sets up a web server to provide + the documentation for the version shipped. + +------------------------------------------------------------------- +Mon Oct 7 15:39:43 UTC 2019 - Egbert Eich <[email protected]> + +- Move srun from 'slurm' to 'slurm-node': srun is required on the + nodes as well so sbatch will work. 'slurm-node' is a requirement + when 'slurm' is installed (bsc#1153095). + +------------------------------------------------------------------- +Wed Oct 2 08:26:02 UTC 2019 - Egbert Eich <[email protected]> + +- Set %base_ver for SLE-15-SP2 to 18.08 (for now). + +------------------------------------------------------------------- +Wed Sep 11 10:55:25 UTC 2019 - Egbert Eich <[email protected]> + +- Edit sample configuration to deprecate "ControlMachine", + "ControlAddr", "BackupController" and "BackupAddr" in favor + "SlurmctldHost". + +------------------------------------------------------------------- +Sat Aug 17 14:20:35 UTC 2019 - Egbert Eich <[email protected]> + +- Fix logic of slurm-munge recommends: slurm-munge requires munge + already, so if we have munge installed we recommend slurm-munge + as the authentication when installing slurm or slurm-node. + +------------------------------------------------------------------- +Sun Jul 14 13:28:13 UTC 2019 - Egbert Eich <[email protected]> + +- Fix build for SLE-11-SP4 and older. + +------------------------------------------------------------------- +Fri Jul 12 09:04:55 UTC 2019 - Christian Goll <[email protected]> + +- added cray depend libraries to seperate package, as they are now + built, since json is enabled + +------------------------------------------------------------------- +Thu Jul 11 10:57:52 UTC 2019 - Christian Goll <[email protected]> + +- Updated to 18.08.8 for fixing (CVE-2019-12838, bsc#1140709, jsc#SLE-7341, + jsc#SLE-7342) + * Update "xauth list" to use the same 10000ms timeout as the other xauth + commands. + * Fix issue in gres code to handle a gres cnt of 0. + * Don't purge jobs if backfill is running. + * Verify job is pending add/removing accrual time. + * Don't abort when the job doesn't have an association that was removed + before the job was able to make it to the database. + * Set state_reason if select_nodes() fails job for QOS or Account. + * Avoid seg_fault on referencing association without a valid_qos bitmap. + * If Association/QOS is removed on a pending job set that job as ineligible. + * When changing a jobs account/qos always make sure you remove the old limits. + * Don't reset a FAIL_QOS or FAIL_ACCOUNT job reason until the qos or + account changed. + * Restore "sreport -T ALL" functionality. + * Correctly typecast signals being sent through the api. + * Properly initialize structures throughout Slurm. + * Sync "numtask" squeue format option for jobs and steps to "numtasks". + * Fix sacct -PD to avoid CA before start jobs. + * Fix potential deadlock with backup slurmctld. + * Fixed issue with jobs not appearing in sacct after dependency satisfied. + * Fix showing non-eligible jobs when asking with -j and not -s. + * Fix issue with backfill scheduler scheduling tasks of an array + when not the head job. + * accounting_storage/mysql - fix SIGABRT in the archive load logic. + * accounting_storage/mysql - fix memory leak in the archive load logic. + * Limit records per single SQL statement when loading archived data. + * Fix unnecessary reloading of job submit plugins. + * Allow job submit plugins to be turned on/off with a reconfigure. + * Fix segfault when loading/unloading Lua job submit plugin multiple times. + * Fix printing duplicate error messages of jobs rejected by job submit plugin. + * Fix printing of job submit plugin messages of het jobs without pack id. + * Fix memory leak in group_cache.c + * Fix jobs stuck from FedJobLock when requeueing in a federation + * Fix requeueing job in a federation of clusters with differing associations + * sacctmgr - free memory before exiting in 'sacctmgr show runaway'. + * Fix seff showing memory overflow when steps tres mem usage is 0. + * Upon archive file name collision, create new archive file instead of + overwriting the old one to prevent lost records. + * Limit archive files to 50000 records per file so that archiving large + databases will succeed. + * Remove stray newlines in SPANK plugin error messages. + * Fix archive loading events. + * In select/cons_res: Only allocate 1 CPU per node with the --overcommit and + --nodelist options. + * Fix main scheduler from potentially not running through whole queue. + * cons_res/job_test - prevent a job from overallocating a node memory. + * cons_res/job_test - fix to consider a node's current allocated memory when + testing a job's memory request. + * Fix issue where multi-node job steps on cloud nodes wouldn't finish cleaning + up until the end of the job (rather than the end of the step). + * Fix issue with a 17.11 sbcast call to a 18.08 daemon. + * Add new job bit_flags of JOB_DEPENDENT. + * Make it so dependent jobs reset the AccrueTime and do not count against any + AccrueTime limits. + * Fix sacctmgr --parsable2 output for reservations and tres. + * Prevent slurmctld from potential segfault after job_start_data() called + for completing job. + * Fix jobs getting on nodes with "scontrol reboot asap". + * Record node reboot events to database. + * Fix node reboot failure message getting to event table. + * Don't write "(null)" to event table when no event reason exists. + * Fix minor memory leak when clearing runaway jobs. + * Avoid flooding slurmctld and logging when prolog complete RPC errors occur. + * Fix GCC 9 compiler warnings. + * Fix seff human readable memory string for values below a megabyte. + * Fix dump/load of rejected heterogeneous jobs. + * For heterogeneous jobs, do not count the each component against the QOS or + association job limit multiple times. + * slurmdbd - avoid reservation flag column corruption with the use of newer + flags, instead preserve the older flag fields that we can still fit in the + smallint field, and discard the rest. + * Fix security issue in accounting_storage/mysql plugin on archive file loads + by always escaping strings within the slurmdbd. CVE-2019-12838. + + +------------------------------------------------------------------- +Mon Jul 8 08:19:23 UTC 2019 - Egbert Eich <[email protected]> + +- Fix build dependency issue around libibmad-devel introduced + in SLE-12-SP4. + +------------------------------------------------------------------- +Mon Jul 8 05:41:11 UTC 2019 - Egbert Eich <[email protected]> + ++++ 1254 more lines (skipped) ++++ between /dev/null ++++ and /work/SRC/openSUSE:Leap:15.1:Update/.slurm.15119.new.5913/slurm.changes New: ---- Fix-Authentication-Bypass-when-Message-Aggregation-is-enabled-CVE-2020-12693.patch PMIx-fix-potential-buffer-overflows-from-use-of-unpackmen_CVE-2020-27745.patch X11-forwarding-avoid-unsafe-use-of-magic-cookie_CVE-2020-27746.patch _service pam_slurm-Initialize-arrays-and-pass-sizes.patch pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch removed-deprecated-xdaemon.patch slurm-18.08.9.tar.bz2 slurm-2.4.4-init.patch slurm-2.4.4-rpath.patch slurm-rpmlintrc slurm.changes slurm.spec slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch slurmctld-uses-xdaemon_-for-systemd.patch slurmd-uses-xdaemon_-for-systemd.patch slurmdbd-uses-xdaemon_-for-systemd.patch slurmsmwd-uses-xdaemon_-for-systemd.patch split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for.patch ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ slurm.spec ++++++ ++++ 1209 lines (skipped) ++++++ Fix-Authentication-Bypass-when-Message-Aggregation-is-enabled-CVE-2020-12693.patch ++++++ From: Egbert Eich <[email protected]> Date: Tue Jul 7 09:59:21 2020 +0200 Subject: Fix Authentication Bypass when Message Aggregation is enabled CVE-2020-12693 Patch-mainline: N/A Git-commit: 66d16879f4dd0f5f88c0e800997d6b9b674cccb5 References: bsc#1172004 This fixes and issue where authentication could be bypassed via an alternate path or channel when message Aggregation was enabled. A race condition allowed a user to launch a process as an arbitrary user. (CVE-2020-12693, bsc#1172004). Signed-off-by: Egbert Eich <[email protected]> --- src/slurmd/slurmd/req.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/slurmd/slurmd/req.c b/src/slurmd/slurmd/req.c index f176cc9..b07d3f6 100644 --- a/src/slurmd/slurmd/req.c +++ b/src/slurmd/slurmd/req.c @@ -2291,7 +2291,7 @@ _rpc_batch_job(slurm_msg_t *msg, bool new_msg) bool replied = false, revoked; slurm_addr_t *cli = &msg->orig_addr; - if (new_msg) { + if (1 || new_msg) { uid_t req_uid = g_slurm_auth_get_uid(msg->auth_cred, conf->auth_info); if (!_slurm_authorized_user(req_uid)) { @@ -5335,7 +5335,9 @@ _rpc_complete_batch(slurm_msg_t *msg) msg->data = NULL; msg_aggr_add_msg(req_msg, 1, - _handle_old_batch_job_launch); + running_serial ? + _handle_old_batch_job_launch : + NULL); return; } else { slurm_msg_t req_msg; ++++++ PMIx-fix-potential-buffer-overflows-from-use-of-unpackmen_CVE-2020-27745.patch ++++++ From c3142dd87e06621ff148791c3d2f298b5c0b3a81 Mon Sep 17 00:00:00 2001 From: Tim Wickberg <[email protected]> Date: Thu, 12 Nov 2020 08:47:51 -0800 Subject: PMIx - fix potential buffer overflows from use of unpackmem(). CVE-2020-27745. --- diff --git a/src/plugins/mpi/pmix/pmixp_coll_ring.c b/src/plugins/mpi/pmix/pmixp_coll_ring.c index 20c54edfe6..64da0c9a6a 100644 --- a/src/plugins/mpi/pmix/pmixp_coll_ring.c +++ b/src/plugins/mpi/pmix/pmixp_coll_ring.c @@ -148,6 +148,7 @@ int pmixp_coll_ring_unpack(Buf buf, pmixp_coll_type_t *type, uint32_t nprocs = 0; uint32_t tmp; int rc, i; + char *temp_ptr; /* 1. extract the type of collective */ if (SLURM_SUCCESS != (rc = unpack32(&tmp, buf))) { @@ -168,13 +169,13 @@ int pmixp_coll_ring_unpack(Buf buf, pmixp_coll_type_t *type, /* 3. get namespace/rank of particular process */ for (i = 0; i < (int)nprocs; i++) { - rc = unpackmem(procs[i].nspace, &tmp, buf); - if (SLURM_SUCCESS != rc) { + if ((rc = unpackmem_ptr(&temp_ptr, &tmp, buf)) || + (strlcpy(procs[i].nspace, temp_ptr, + PMIXP_MAX_NSLEN + 1) > PMIXP_MAX_NSLEN)) { PMIXP_ERROR("Cannot unpack namespace for process #%d", i); return rc; } - procs[i].nspace[tmp] = '\0'; rc = unpack32(&tmp, buf); procs[i].rank = tmp; @@ -186,11 +187,14 @@ int pmixp_coll_ring_unpack(Buf buf, pmixp_coll_type_t *type, } /* 4. extract the ring info */ - if (SLURM_SUCCESS != (rc = unpackmem((char *)ring_hdr, &tmp, buf))) { + if ((rc = unpackmem_ptr(&temp_ptr, &tmp, buf)) || + (tmp != sizeof(pmixp_coll_ring_msg_hdr_t))) { PMIXP_ERROR("Cannot unpack ring info"); return rc; } + memcpy(ring_hdr, temp_ptr, sizeof(pmixp_coll_ring_msg_hdr_t)); + return SLURM_SUCCESS; } diff --git a/src/plugins/mpi/pmix/pmixp_coll_tree.c b/src/plugins/mpi/pmix/pmixp_coll_tree.c index b0990e92ce..4829c2286c 100644 --- a/src/plugins/mpi/pmix/pmixp_coll_tree.c +++ b/src/plugins/mpi/pmix/pmixp_coll_tree.c @@ -76,6 +76,7 @@ int pmixp_coll_tree_unpack(Buf buf, pmixp_coll_type_t *type, uint32_t nprocs = 0; uint32_t tmp; int i, rc; + char *temp_ptr; /* 1. extract the type of collective */ if (SLURM_SUCCESS != (rc = unpack32(&tmp, buf))) { @@ -96,13 +97,13 @@ int pmixp_coll_tree_unpack(Buf buf, pmixp_coll_type_t *type, for (i = 0; i < (int)nprocs; i++) { /* 3. get namespace/rank of particular process */ - rc = unpackmem(procs[i].nspace, &tmp, buf); - if (SLURM_SUCCESS != rc) { + if ((rc = unpackmem_ptr(&temp_ptr, &tmp, buf)) || + (strlcpy(procs[i].nspace, temp_ptr, + PMIXP_MAX_NSLEN + 1) > PMIXP_MAX_NSLEN)) { PMIXP_ERROR("Cannot unpack namespace for process #%d", i); return rc; } - procs[i].nspace[tmp] = '\0'; unsigned int tmp; rc = unpack32(&tmp, buf); -- 2.29.2 ++++++ X11-forwarding-avoid-unsafe-use-of-magic-cookie_CVE-2020-27746.patch ++++++ From 07309deb45c33e735e191faf9dd31cca1054a15c Mon Sep 17 00:00:00 2001 From: Tim Wickberg <[email protected]> Date: Thu, 12 Nov 2020 08:49:02 -0800 Subject: X11 forwarding - avoid unsafe use of magic cookie as arg to xauth command. Magic cookie can leak through /proc this way. There is a race here between this usually short-lived xauth command running and an attacker scraping the value from /proc. This can be exacerbated through use of X11Parameters=home_xauthority on a cluster with a shared home directory under heavy load. CVE-2020-27746. Note from Ana Guerrero <[email protected]> The patch got a light modification from the git commit, given that in slurm 18.08 the run_command had an argument less. This doesn't affect the security fix. --- diff --git a/src/common/x11_util.c b/src/common/x11_util.c index d7f2457748..275f222945 100644 --- a/src/common/x11_util.c +++ b/src/common/x11_util.c @@ -185,27 +185,44 @@ int i=0, status; char *result; char **xauth_argv; + char template[] = "/tmp/xauth-source-XXXXXX"; + char *contents = NULL; + int fd; + + /* protect against weak file permissions in old glibc */ + umask(0077); + if ((fd = mkstemp(template)) < 0) + fatal("%s: could not create temp file", __func__); + + xstrfmtcat(contents, "add %s/unix:%u MIT-MAGIC-COOKIE-1 %s\n", + host, display, cookie); + safe_write(fd, contents, strlen(contents)); + xfree(contents); + close(fd); xauth_argv = xmalloc(sizeof(char *) * 10); - xauth_argv[i++] = xstrdup("xauth"); - xauth_argv[i++] = xstrdup("-v"); - xauth_argv[i++] = xstrdup("-f"); - xauth_argv[i++] = xstrdup(xauthority); - xauth_argv[i++] = xstrdup("add"); - xauth_argv[i++] = xstrdup_printf("%s/unix:%u", host, display); - xauth_argv[i++] = xstrdup("MIT-MAGIC-COOKIE-1"); - xauth_argv[i++] = xstrdup(cookie); + xauth_argv[i++] = "xauth"; + xauth_argv[i++] = "-v"; + xauth_argv[i++] = "-f"; + xauth_argv[i++] = xauthority; + xauth_argv[i++] = "source"; + xauth_argv[i++] = template; xauth_argv[i++] = NULL; xassert(i < 10); result = run_command("xauth", XAUTH_PATH, xauth_argv, 10000, &status); - free_command_argv(xauth_argv); + (void) unlink(template); + xfree(xauth_argv); debug2("%s: result from xauth: %s", __func__, result); xfree(result); return status; + +rwfail: + fatal("%s: could not write temporary xauth file", __func__); + return SLURM_ERROR; } extern int x11_delete_xauth(char *xauthority, char *host, uint16_t display) ++++++ _service ++++++ <services> <service name="download_files" mode="localonly"> <param name="enforceupstream">yes</param> </service> </services> ++++++ pam_slurm-Initialize-arrays-and-pass-sizes.patch ++++++ From: Sebastian Krahmer <[email protected]> Date: Thu Feb 2 09:49:38 2017 +0100 Subject: [PATCH]pam_slurm: Initialize arrays and pass sizes Git-repo: https://github.com/SchedMD/slurm Git-commit: fbfbb90f6a2e7f134220991ed3263894ba365411 References: bsc#1007053 Signed-off-by: Egbert Eich <[email protected]> PAM is security critical: - clear arrays - ensure strings are NULL-terminated. Signed-off-by: Egbert Eich <[email protected]> --- contribs/pam/pam_slurm.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/contribs/pam/pam_slurm.c b/contribs/pam/pam_slurm.c index 0968a9c..ee179d5 100644 --- a/contribs/pam/pam_slurm.c +++ b/contribs/pam/pam_slurm.c @@ -266,9 +266,9 @@ static int _gethostname_short (char *name, size_t len) { int error_code, name_len; - char *dot_ptr, path_name[1024]; + char *dot_ptr, path_name[1024] = {0}; - error_code = gethostname(path_name, sizeof(path_name)); + error_code = gethostname(path_name, sizeof(path_name) - 1); if (error_code) return error_code; @@ -296,11 +296,11 @@ static int _slurm_match_allocation(uid_t uid) { int authorized = 0, i; - char hostname[MAXHOSTNAMELEN]; + char hostname[MAXHOSTNAMELEN] = {0}; char *nodename = NULL; job_info_msg_t * msg; - if (_gethostname_short(hostname, sizeof(hostname)) < 0) { + if (_gethostname_short(hostname, sizeof(hostname) - 1) < 0) { _log_msg(LOG_ERR, "gethostname: %m"); return 0; } @@ -409,7 +409,7 @@ _send_denial_msg(pam_handle_t *pamh, struct _options *opts, */ extern void libpam_slurm_init (void) { - char libslurmname[64]; + char libslurmname[64] = {0}; if (slurm_h) return; @@ -417,10 +417,10 @@ extern void libpam_slurm_init (void) /* First try to use the same libslurm version ("libslurm.so.24.0.0"), * Second try to match the major version number ("libslurm.so.24"), * Otherwise use "libslurm.so" */ - if (snprintf(libslurmname, sizeof(libslurmname), + if (snprintf(libslurmname, sizeof(libslurmname) - 1, "libslurm.so.%d.%d.%d", SLURM_API_CURRENT, SLURM_API_REVISION, SLURM_API_AGE) >= - sizeof(libslurmname) ) { + sizeof(libslurmname) - 1) { _log_msg (LOG_ERR, "Unable to write libslurmname\n"); } else if ((slurm_h = dlopen(libslurmname, RTLD_NOW|RTLD_GLOBAL))) { return; @@ -429,8 +429,10 @@ extern void libpam_slurm_init (void) libslurmname, dlerror ()); } - if (snprintf(libslurmname, sizeof(libslurmname), "libslurm.so.%d", - SLURM_API_CURRENT) >= sizeof(libslurmname) ) { + memset(libslurmname, 0, sizeof(libslurmname)); + + if (snprintf(libslurmname, sizeof(libslurmname) - 1, "libslurm.so.%d", + SLURM_API_CURRENT) >= sizeof(libslurmname) - 1) { _log_msg (LOG_ERR, "Unable to write libslurmname\n"); } else if ((slurm_h = dlopen(libslurmname, RTLD_NOW|RTLD_GLOBAL))) { return; ++++++ pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch ++++++ From 4c38389917a54e137a4578b45f0f6a821c8c591a Mon Sep 17 00:00:00 2001 From: Matthias Gerstner <[email protected]> Date: Wed, 5 Dec 2018 15:03:19 +0100 Subject: [PATCH 1/3] pam_slurm_adopt: avoid running outside of the sshd PAM service context This pam module is tailored towards running in the context of remote ssh logins. When running in a different context like a local sudo call then the module could be influenced by e.g. passing environment variables like SLURM_CONF. By limiting the module to only perform its actions when running in the sshd context by default this situation can be avoided. An additional pam module argument service=<service> allows an Administrator to control this behaviour, if different behaviour is explicitly desired. Signed-off-by: Christian Goll <[email protected]> --- contribs/pam_slurm_adopt/README | 172 ++++++++++++++++++++++++++++- contribs/pam_slurm_adopt/pam_slurm_adopt.c | 46 ++++++++ 2 files changed, 217 insertions(+), 1 deletion(-) diff --git a/contribs/pam_slurm_adopt/README b/contribs/pam_slurm_adopt/README index 07039740f8..8baece6d2e 100644 --- a/contribs/pam_slurm_adopt/README +++ b/contribs/pam_slurm_adopt/README @@ -1,5 +1,175 @@ Current documentation can be found here: https://slurm.schedmd.com/pam_slurm_adopt.html - (Which is generated from docs/html/pam_slurm_adopt.shtml.) + +======= +AUTHOR + Ryan Cox <[email protected]> + +MODULE TYPES PROVIDED + account + +DESCRIPTION + This module attempts to determine the job which originated this connection. + The module is configurable; these are the default steps: + + 1) Check the local stepd for a count of jobs owned by the non-root user + a) If none, deny (option action_no_jobs) + b) If only one, adopt the process into that job + c) If multiple, continue + 2) Determine src/dst IP/port of socket + 3) Issue callerid RPC to slurmd at IP address of source + a) If the remote slurmd can identify the source job, adopt into that job + b) If not, continue + 4) Pick a random local job from the user to adopt into (option action_unknown) + + Jobs are adopted into a job's allocation step. + +MODULE OPTIONS +This module has the following options (* = default): + + ignore_root - By default, all root connections are ignored. If the RPC + is sent to a node which drops packets to the slurmd port, the + RPC will block for some time before failing. This is + unlikely to be desirable. Likewise, root may be trying to + administer the system and not do work that should be in a job. + The job may trigger oom-killer or just exit. If root restarts + a service or similar, it will be tracked and killed by Slurm + when the job exits. This sounds bad because it is bad. + + 1* = Let the connection through without adoption + 0 = I am crazy. I want random services to die when root jobs exit. I + also like it when RPCs block for a while then time out. + + + action_no_jobs - The action to perform if the user has no jobs on the node + + ignore = Do nothing. Fall through to the next pam module + deny* = Deny the connection + + + action_unknown - The action to perform when the user has multiple jobs on + the node *and* the RPC does not locate the source job. + If the RPC mechanism works properly in your environment, + this option will likely be relevant *only* when connecting + from a login node. + + newest* = Pick the newest job on the node. The "newest" job is chosen + based on the mtime of the job's step_extern cgroup; asking + Slurm would require an RPC to the controller. The user can ssh + in but may be adopted into a job that exits earlier than the + job they intended to check on. The ssh connection will at + least be subject to appropriate limits and the user can be + informed of better ways to accomplish their objectives if this + becomes a problem + allow = Let the connection through without adoption + deny = Deny the connection + + + action_adopt_failure - The action to perform if the process is unable to be + adopted into any job for whatever reason. If the + process cannot be adopted into the job identified by + the callerid RPC, it will fall through to the + action_unknown code and try to adopt there. A failure + at that point or if there is only one job will result + in this action being taken. + + allow* = Let the connection through without adoption + deny = Deny the connection + + action_generic_failure - The action to perform if there are certain failures + such as the inability to talk to the local slurmd + or if the kernel doesn't offer the correct + facilities. + + ignore* = Do nothing. Fall through to the next pam module + allow = Let the connection through without adoption + deny = Deny the connection + + log_level - See SlurmdDebug in slurm.conf(5) for available options. The + default log_level is info. + + disable_x11 - turn off Slurm built-in X11 forwarding support. + + 1 = Do not check for Slurm's X11 forwarding support, and no not + alter the DISPLAY variable. + 0* = If the step the job is adopted into has X11 enabled, set + the DISPLAY variable in the processes environment accordingly. + + service - The pam service name for which this module should run. By default + it only runs for sshd for which it was designed for. A + different service name can be specified like "login" or "*" to + allow the module to in any service context. For local pam logins + this module could cause unexpected behaviour or even security + issues. Therefore if the service name does not match then this + module will not perform the adoption logic and returns + PAM_IGNORE immediately. + +SLURM.CONF CONFIGURATION + PrologFlags=contain must be set in slurm.conf. This sets up the "extern" step + into which ssh-launched processes will be adopted. + + **** IMPORTANT **** + PrologFlags=contain must be in place *before* using this module. + The module bases its checks on local steps that have already been launched. If + the user has no steps on the node, such as the extern step, the module will + assume that the user has no jobs allocated to the node. Depending on your + configuration of the pam module, you might deny *all* user ssh attempts. + +NOTES + This module and the related RPC currently support Linux systems which + have network connection information available through /proc/net/tcp{,6}. A + proccess's sockets must exist as symlinks in its /proc/self/fd directory. + + The RPC data structure itself is OS-agnostic. If support is desired for a + different OS, relevant code must be added to find one's socket information + then match that information on the remote end to a particular process which + Slurm is tracking. + + IPv6 is supported by the RPC data structure itself and the code which sends it + and receives it. Sending the RPC to an IPv6 address is not currently + supported by Slurm. Once support is added, remove the relevant check in + slurm_network_callerid(). + + For the action_unknown=newest setting to work, the memory cgroup must be in + use so that the code can check mtimes of cgroup directories. If you would + prefer to use a different subsystem, modify the _indeterminate_multiple + function. + +FIREWALLS, IP ADDRESSES, ETC. + slurmd should be accessible on any IP address from which a user might launch + ssh. The RPC to determine the source job must be able to reach the slurmd + port on that particular IP address. + + If there is no slurmd on the source node, such as on a login node, it is + better to have the RPC be rejected rather than silently dropped. This + will allow better responsiveness to the RPC initiator. + +EXAMPLES / SUGGESTED USAGE + Use of this module is recommended on any compute node. + + Add the following line to the appropriate file in /etc/pam.d, such as + system-auth or sshd: + + account sufficient pam_slurm_adopt.so + + If you always want to allow access for an administrative group (e.g. wheel), + stack the pam_access module after pam_slurm_adopt. A success with + pam_slurm_adopt is sufficient to allow access but the pam_access module can + allow others, such as staff, access even without jobs. + + account sufficient pam_slurm_adopt.so + account required pam_access.so + + + Then edit the pam_access configuration file (/etc/security/access.conf): + + +:wheel:ALL + -:ALL:ALL + + When access is denied, the user will receive a relevant error message. + + pam_systemd.so is known to not play nice with Slurm's usage of cgroups. It is + recommended that you disable it or possibly add pam_slurm_adopt.so after + pam_systemd.so. diff --git a/contribs/pam_slurm_adopt/pam_slurm_adopt.c b/contribs/pam_slurm_adopt/pam_slurm_adopt.c index 51f21e8729..dccad90185 100644 --- a/contribs/pam_slurm_adopt/pam_slurm_adopt.c +++ b/contribs/pam_slurm_adopt/pam_slurm_adopt.c @@ -94,6 +94,7 @@ static struct { log_level_t log_level; char *node_name; bool disable_x11; + char *pam_service; } opts; static void _init_opts(void) @@ -107,6 +108,7 @@ static void _init_opts(void) opts.log_level = LOG_LEVEL_INFO; opts.node_name = NULL; opts.disable_x11 = false; + opts.pam_service = NULL; } static slurm_cgroup_conf_t *slurm_cgroup_conf = NULL; @@ -576,6 +578,9 @@ static void _parse_opts(pam_handle_t *pamh, int argc, const char **argv) opts.node_name = xstrdup(v); } else if (!xstrncasecmp(*argv, "disable_x11=1", 13)) { opts.disable_x11 = true; + } else if (!xstrncasecmp(*argv, "service=", 8)) { + v = (char *)(8 + *argv); + opts.pam_service = xstrdup(v); } } @@ -601,6 +606,40 @@ static int _load_cgroup_config() return SLURM_SUCCESS; } +/* Make sure to only continue if we're running in the sshd context + * + * If this module is used locally e.g. via sudo then unexpected things might + * happen (e.g. passing environment variables interpreted by slurm code like + * SLURM_CONF or inheriting file descriptors that are used by _try_rpc()). + */ +static int check_pam_service(pam_handle_t *pamh) +{ + const char *allowed = opts.pam_service ? opts.pam_service : "sshd"; + char *service = NULL; + int rc; + + if (!strcmp(allowed, "*")) + // any service name is allowed + return PAM_SUCCESS; + + rc = pam_get_item(pamh, PAM_SERVICE, (void*)&service); + + if (rc != PAM_SUCCESS) { + pam_syslog(pamh, LOG_ERR, "failed to obtain PAM_SERVICE name"); + return rc; + } + else if (service == NULL) { + // this shouldn't actually happen + return PAM_BAD_ITEM; + } + + if (!strcmp(service, allowed)) { + return PAM_SUCCESS; + } + + pam_syslog(pamh, LOG_INFO, "Not adopting process since this is not an allowed pam service"); + return PAM_IGNORE; +} /* Parse arguments, etc then get my socket address/port information. Attempt to * adopt this process into a job in the following order: @@ -622,6 +661,12 @@ PAM_EXTERN int pam_sm_acct_mgmt(pam_handle_t *pamh, int flags _init_opts(); _parse_opts(pamh, argc, argv); + + retval = check_pam_service(pamh); + if (retval != PAM_SUCCESS) { + return retval; + } + _log_init(opts.log_level); switch (opts.action_generic_failure) { @@ -765,6 +810,7 @@ cleanup: xfree(buf); xfree(slurm_cgroup_conf); xfree(opts.node_name); + xfree(opts.pam_service); return rc; } -- 2.16.4 ++++++ pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch ++++++ From a5d4481c05e2afa1ff1920446663e66c48ef9277 Mon Sep 17 00:00:00 2001 From: Matthias Gerstner <[email protected]> Date: Wed, 5 Dec 2018 14:08:07 +0100 Subject: [PATCH 2/3] pam_slurm_adopt: send_user_msg: don't copy undefined data into message Using memcpy, an amount of undefined data from the stack will be copied into the target buffer. While pam_conv probably doesn't evalute the extra data it still unclean to do that. It could lead up to an information leak somewhen. Signed-off-by: Christian Goll <[email protected]> --- contribs/pam_slurm_adopt/helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contribs/pam_slurm_adopt/helper.c b/contribs/pam_slurm_adopt/helper.c index 9c3e202a87..1bac0a0fcf 100644 --- a/contribs/pam_slurm_adopt/helper.c +++ b/contribs/pam_slurm_adopt/helper.c @@ -128,7 +128,7 @@ send_user_msg(pam_handle_t *pamh, const char *mesg) /* Construct msg to send to app. */ - memcpy(str, mesg, sizeof(str)); + strncpy(str, mesg, sizeof(str)); msg[0].msg_style = PAM_ERROR_MSG; msg[0].msg = str; pmsg[0] = &msg[0]; -- 2.16.4 ++++++ pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch ++++++ From d630acbf5709dcf03f9e8cd1739a77cfe6c1e4b8 Mon Sep 17 00:00:00 2001 From: Matthias Gerstner <[email protected]> Date: Wed, 5 Dec 2018 15:08:53 +0100 Subject: [PATCH 3/3] pam_slurm_adopt: use uid to determine whether root is logging on In some systems there can be multiple user accounts for uid 0, therefore the check for literal user name "root" might be insufficient. Signed-off-by: Christian Goll <[email protected]> --- contribs/pam_slurm_adopt/pam_slurm_adopt.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/contribs/pam_slurm_adopt/pam_slurm_adopt.c b/contribs/pam_slurm_adopt/pam_slurm_adopt.c index dccad90185..f1d062885e 100644 --- a/contribs/pam_slurm_adopt/pam_slurm_adopt.c +++ b/contribs/pam_slurm_adopt/pam_slurm_adopt.c @@ -708,17 +708,6 @@ PAM_EXTERN int pam_sm_acct_mgmt(pam_handle_t *pamh, int flags opts.ignore_root = 1; } - /* Ignoring root is probably best but the admin can allow it */ - if (!strcmp(user_name, "root")) { - if (opts.ignore_root) { - info("Ignoring root user"); - return PAM_IGNORE; - } else { - /* This administrator is crazy */ - info("Danger!!! This is a connection attempt by root and ignore_root=0 is set! Hope for the best!"); - } - } - /* Calculate buffer size for getpwnam_r */ bufsize = sysconf(_SC_GETPW_R_SIZE_MAX); if (bufsize == -1) @@ -740,6 +729,16 @@ PAM_EXTERN int pam_sm_acct_mgmt(pam_handle_t *pamh, int flags if (_load_cgroup_config() != SLURM_SUCCESS) return rc; + /* Ignoring root is probably best but the admin can allow it */ + if (pwd.pw_uid == 0) { + if (opts.ignore_root) { + info("Ignoring root user"); + return PAM_IGNORE; + } else { + /* This administrator is crazy */ + info("Danger!!! This is a connection attempt by root (user id 0) and ignore_root=0 is set! Hope for the best!"); + } + } /* * Check if there are any steps on the node from any user. A failure here -- 2.16.4 ++++++ removed-deprecated-xdaemon.patch ++++++ From: Egbert Eich <[email protected]> Date: Tue Nov 20 11:54:02 2018 +0100 Subject: removed deprecated xdaemon Patch-mainline: Not yet Git-commit: b39551df0f202203c16d4e9a9a7b640691acf882 References: bsc#1084125 Signed-off-by: Egbert Eich <[email protected]> --- slurm-18.08.3/src/common/daemonize.c | 12 ------------ slurm-18.08.3/src/common/daemonize.h | 1 - 2 files changed, 13 deletions(-) diff --git a/slurm-18.08.3/src/common/daemonize.c b/slurm-18.08.3/src/common/daemonize.c index fee9d60..bec8202 100644 --- a/src/common/daemonize.c +++ b/src/common/daemonize.c @@ -138,18 +138,6 @@ void xdaemon_finish(int fd) } } -/* - * keep depercated api - */ - -int xdaemon(void) -{ - int ret_val; - ret_val= xdaemon_init(); - xdaemon_finish(ret_val); - return ret_val; -} - /* * Read and return pid stored in pidfile. * Returns 0 if file doesn't exist or pid cannot be read. diff --git a/slurm-18.08.3/src/common/daemonize.h b/slurm-18.08.3/src/common/daemonize.h index 8b60b4f..b7cb625 100644 --- a/src/common/daemonize.h +++ b/src/common/daemonize.h @@ -44,7 +44,6 @@ * Start fork process into background and inherit new session. * */ -extern int xdaemon(void); extern int xdaemon_init(void); /* ++++++ slurm-2.4.4-init.patch ++++++ diff -aruN slurm-2.4.4.orig/etc/init.d.slurmdbd.in slurm-2.4.4/etc/init.d.slurmdbd.in --- slurm-2.4.4.orig/etc/init.d.slurmdbd.in 2012-11-02 17:46:12.000000000 +0100 +++ slurm-2.4.4/etc/init.d.slurmdbd.in 2012-11-17 19:00:06.079651971 +0100 @@ -15,7 +15,7 @@ # Required-Stop: $remote_fs $syslog $network munge # Should-Start: $named # Should-Stop: $named -# Default-Start: 2 3 4 5 +# Default-Start: 2 3 5 # Default-Stop: 0 1 6 # Short-Description: SLURM database daemon # Description: Start slurm to provide database server for SLURM diff -aruN slurm-2.4.4.orig/etc/init.d.slurm.in slurm-2.4.4/etc/init.d.slurm.in --- slurm-2.4.4.orig/etc/init.d.slurm.in 2012-11-02 17:46:12.000000000 +0100 +++ slurm-2.4.4/etc/init.d.slurm.in 2012-11-17 18:59:51.799652475 +0100 @@ -19,7 +19,7 @@ # Required-Stop: $remote_fs $syslog $network munge # Should-Start: $named # Should-Stop: $named -# Default-Start: 2 3 4 5 +# Default-Start: 2 3 5 # Default-Stop: 0 1 6 # Short-Description: slurm daemon management # Description: Start slurm to provide resource management ++++++ slurm-2.4.4-rpath.patch ++++++ diff -aruN slurm-2.4.4.orig/contribs/perlapi/libslurm/perl/Makefile.PL.in slurm-2.4.4/contribs/perlapi/libslurm/perl/Makefile.PL.in --- slurm-2.4.4.orig/contribs/perlapi/libslurm/perl/Makefile.PL.in 2012-11-02 17:46:12.000000000 +0100 +++ slurm-2.4.4/contribs/perlapi/libslurm/perl/Makefile.PL.in 2012-11-17 17:42:51.919815606 +0100 @@ -77,7 +77,7 @@ # AIX has problems with not always having the correct # flags so we have to add some :) my $os = lc(`uname`); -my $other_ld_flags = "-Wl,-rpath,@top_builddir@/src/api/.libs -Wl,-rpath,@prefix@/lib"; +my $other_ld_flags = "-L@top_builddir@/src/api/.libs -lslurm"; $other_ld_flags = " -brtl -G -bnoentry -bgcbypass:1000 -bexpfull" if $os =~ "aix"; @@ -88,7 +88,7 @@ ($] >= 5.005 ? ## Add these new keywords supported since 5.005 (ABSTRACT_FROM => 'lib/Slurm.pm', # retrieve abstract from module AUTHOR => 'Hongjia Cao <[email protected]>') : ()), - LIBS => ["-L@top_builddir@/src/api/.libs -L@prefix@/lib -lslurm"], # e.g., '-lm' + LIBS => ["-L@prefix@/lib -lslurm"], # e.g., '-lm' DEFINE => '', # e.g., '-DHAVE_SOMETHING' INC => "-I. -I@top_srcdir@ -I@top_srcdir@/contribs/perlapi/common -I@top_builddir@", # Un-comment this if you add C files to link with later: diff -aruN slurm-2.4.4.orig/contribs/perlapi/libslurmdb/perl/Makefile.PL.in slurm-2.4.4/contribs/perlapi/libslurmdb/perl/Makefile.PL.in --- slurm-2.4.4.orig/contribs/perlapi/libslurmdb/perl/Makefile.PL.in 2012-11-02 17:46:12.000000000 +0100 +++ slurm-2.4.4/contribs/perlapi/libslurmdb/perl/Makefile.PL.in 2012-11-17 17:41:27.163818599 +0100 @@ -76,7 +76,7 @@ # AIX has problems with not always having the correct # flags so we have to add some :) my $os = lc(`uname`); -my $other_ld_flags = "-Wl,-rpath,@top_builddir@/src/db_api/.libs -Wl,-rpath,@prefix@/lib"; +my $other_ld_flags = "-L@top_builddir@/src/api/.libs -lslurm"; $other_ld_flags = " -brtl -G -bnoentry -bgcbypass:1000 -bexpfull" if $os =~ "aix"; @@ -87,7 +87,7 @@ ($] >= 5.005 ? ## Add these new keywords supported since 5.005 (ABSTRACT_FROM => 'Slurmdb.pm', # retrieve abstract from module AUTHOR => 'Don Lipari <[email protected]>') : ()), - LIBS => ["-L@top_builddir@/src/db_api/.libs -L@prefix@/lib -lslurmdb"], # e.g., '-lm' + LIBS => ["-L@prefix@/lib -lslurmdb"], # e.g., '-lm' DEFINE => '', # e.g., '-DHAVE_SOMETHING' INC => "-I. -I@top_srcdir@ -I@top_srcdir@/contribs/perlapi/common -I@top_builddir@", # Un-comment this if you add C files to link with later: ++++++ slurm-rpmlintrc ++++++ addFilter(".*obsolete-not-provided slurm-sched-wiki.*") addFilter(".*obsolete-not-provided slurmdb-direct.*") ++++++ slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch ++++++ From: Egbert Eich <[email protected]> Date: Tue Nov 20 09:22:15 2018 +0100 Subject: slurmctld: rerun agent_init() when backup controller takes over Patch-mainline: Not yet Git-commit: 21a7abc02e4a27cc64a213ba1fc8572a20e21ba9 References: bsc#1084917 A slurmctld backup controller often fails to clean up jobs which have finished, the node appears in an 'IDLE+COMPLETING' state while squeue -l still shows the job in a completing state. This situation persists until the primary controller is restarted and cleans up all tasks in 'COMPLETING' state. This issue is caused by a race condition in the backup controller: When the backup controller detects that the primary controller is inaccessible, it will run thru a restart cycle. To trigger the shutdown of some entities, it will set slurmctld_config.shutdown_time to a value != 0. Before continuing as the controller in charge, it resets this variable to 0 again. The agent which handles the request queue - from a separate thread - wakes up periodically (in a 2 sec interval) and checks for things to do. If it finds slurmctld_config.shutdown_time set to a value != 0, it will terminate. If this wakeup occurs in the 'takeover window' between the variable being set to !=0 and reset to 0, the agent goes away and will no longer be available to handle queued requests as there is nothing at the end of the 'takeover window' that would restart it. This fix adds a restart of the agent by calling agent_init() after slurmctld_config.shutdown_time has been reset to 0. Should an agent still be running (because it didn't wake up during the 'takeover window') it will be caught in agent_init(). Signed-off-by: Egbert Eich <[email protected]> --- src/slurmctld/backup.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/slurmctld/backup.c b/src/slurmctld/backup.c index de74513..2b4c74e 100644 --- a/src/slurmctld/backup.c +++ b/src/slurmctld/backup.c @@ -65,6 +65,7 @@ #include "src/slurmctld/read_config.h" #include "src/slurmctld/slurmctld.h" #include "src/slurmctld/trigger_mgr.h" +#include "src/slurmctld/agent.h" #define _DEBUG 0 #define SHUTDOWN_WAIT 2 /* Time to wait for primary server shutdown */ @@ -258,6 +259,9 @@ void run_backup(slurm_trigger_callbacks_t *callbacks) error("Unable to recover slurm state"); abort(); } + /* Reinit agent in case it has been terminated - agent_init() + will check itself */ + agent_init(); slurmctld_config.shutdown_time = (time_t) 0; unlock_slurmctld(config_write_lock); select_g_select_nodeinfo_set_all(); ++++++ slurmctld-uses-xdaemon_-for-systemd.patch ++++++ From: Egbert Eich <[email protected]> Date: Tue Nov 20 09:47:47 2018 +0100 Subject: slurmctld uses xdaemon_* for systemd Patch-mainline: Not yet Git-commit: 0f0c00a4a57d12be04d16f4646c186d3e5f03dd1 References: bsc#1084125 Signed-off-by: Egbert Eich <[email protected]> --- slurm-18.08.3/src/slurmctld/controller.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/slurm-18.08.3/src/slurmctld/controller.c b/slurm-18.08.3/src/slurmctld/controller.c index a1762de..d123db3 100644 --- a/src/slurmctld/controller.c +++ b/src/slurmctld/controller.c @@ -260,7 +260,7 @@ static void * _wait_primary_prog(void *arg); /* main - slurmctld main function, start various threads and process RPCs */ int main(int argc, char **argv) { - int cnt, error_code, i; + int cnt, error_code, i, fd; struct timeval start, now; struct stat stat_buf; struct rlimit rlim; @@ -326,7 +326,11 @@ int main(int argc, char **argv) if (daemonize) { slurmctld_config.daemonize = 1; - if (xdaemon()) + /* + * Just start daemonizing if not in test mode + */ + fd = xdaemon_init(); + if (fd == -1) error("daemon(): %m"); log_set_timefmt(slurmctld_conf.log_fmt); log_alter(log_opts, LOG_DAEMON, @@ -348,6 +352,9 @@ int main(int argc, char **argv) _init_pidfile(); _become_slurm_user(); } + if (daemonize) { + xdaemon_finish(fd); + } /* * Create StateSaveLocation directory if necessary. ++++++ slurmd-uses-xdaemon_-for-systemd.patch ++++++ From: Egbert Eich <[email protected]> Date: Tue Nov 20 09:52:22 2018 +0100 Subject: slurmd uses xdaemon_* for systemd Patch-mainline: Not yet Git-commit: 3988e62eb8c20a29a7a016f264c6d65e114cfdf4 References: bsc#1084125 Signed-off-by: Egbert Eich <[email protected]> --- slurm-18.08.3/src/slurmd/slurmd/slurmd.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/slurm-18.08.3/src/slurmd/slurmd/slurmd.c b/slurm-18.08.3/src/slurmd/slurmd/slurmd.c index aa35f8a..b2feaf9 100644 --- a/src/slurmd/slurmd/slurmd.c +++ b/src/slurmd/slurmd/slurmd.c @@ -215,7 +215,7 @@ static void _wait_for_all_threads(int secs); int main (int argc, char **argv) { - int i, pidfd; + int i, pidfd, pipefd; int blocked_signals[] = {SIGPIPE, 0}; int cc; char *oom_value; @@ -300,7 +300,8 @@ main (int argc, char **argv) * Become a daemon if desired. */ if (conf->daemonize) { - if (xdaemon()) + pipefd = xdaemon_init(); + if (pipefd == -1) error("Couldn't daemonize slurmd: %m"); } test_core_limit(); @@ -356,6 +357,9 @@ main (int argc, char **argv) conf->pid = getpid(); pidfd = create_pidfile(conf->pidfile, 0); + if (conf->daemonize) { + xdaemon_finish(pipefd); + } rfc2822_timestamp(time_stamp, sizeof(time_stamp)); info("%s started on %s", slurm_prog_name, time_stamp); ++++++ slurmdbd-uses-xdaemon_-for-systemd.patch ++++++ From: Egbert Eich <[email protected]> Date: Tue Nov 20 09:58:47 2018 +0100 Subject: slurmdbd uses xdaemon_* for systemd Patch-mainline: Not yet Git-commit: 8a286cbaf3fe7ebe009106675a4624a2272d616f References: bsc#1084125 Signed-off-by: Egbert Eich <[email protected]> --- slurm-18.08.3/src/slurmdbd/slurmdbd.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/slurm-18.08.3/src/slurmdbd/slurmdbd.c b/slurm-18.08.3/src/slurmdbd/slurmdbd.c index 471c724..8c7ea94 100644 --- a/src/slurmdbd/slurmdbd.c +++ b/src/slurmdbd/slurmdbd.c @@ -103,7 +103,7 @@ static List lft_rgt_list = NULL; static void _become_slurm_user(void); static void _commit_handler_cancel(void); static void *_commit_handler(void *no_data); -static void _daemonize(void); +static int _daemonize_start(void); static void _default_sigaction(int sig); static void _free_dbd_stats(void); static void _init_config(void); @@ -127,6 +127,7 @@ int main(int argc, char **argv) { char node_name_short[128]; char node_name_long[128]; + int pipefd; void *db_conn = NULL; assoc_init_args_t assoc_init_arg; @@ -139,8 +140,9 @@ int main(int argc, char **argv) _update_nice(); _kill_old_slurmdbd(); - if (foreground == 0) - _daemonize(); + if (foreground == 0) { + pipefd = _daemonize_start(); + } /* * Need to create pidfile here in case we setuid() below @@ -149,7 +151,9 @@ int main(int argc, char **argv) * able to write a core dump. */ _init_pidfile(); - + if (foreground == 0) { + xdaemon_finish(pipefd); + } /* * Do plugin init's after _init_pidfile so systemd is happy as * slurm_acct_storage_init() could take a long time to finish if running @@ -598,11 +602,14 @@ static void _init_pidfile(void) /* Become a daemon (child of init) and * "cd" to the LogFile directory (if one is configured) */ -static void _daemonize(void) +static int _daemonize_start(void) { - if (xdaemon()) + int retval; + retval = xdaemon_init(); + if (retval == -1) error("daemon(): %m"); log_alter(log_opts, LOG_DAEMON, slurmdbd_conf->log_file); + return retval; } static void _set_work_dir(void) ++++++ slurmsmwd-uses-xdaemon_-for-systemd.patch ++++++ From: Egbert Eich <[email protected]> Date: Tue Nov 20 10:07:35 2018 +0100 Subject: slurmsmwd uses xdaemon_* for systemd Patch-mainline: Not yet Git-commit: 110d76a0c56b35c8c3c9b24e136476a67a6eb413 References: bsc#1084125 Signed-off-by: Egbert Eich <[email protected]> --- slurm-18.08.3/contribs/cray/slurmsmwd/main.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/slurm-18.08.3/contribs/cray/slurmsmwd/main.c b/slurm-18.08.3/contribs/cray/slurmsmwd/main.c index a5247bf..1efb1f8 100644 --- a/contribs/cray/slurmsmwd/main.c +++ b/contribs/cray/slurmsmwd/main.c @@ -538,6 +538,7 @@ int main(int argc, char **argv) { pthread_t processing_thread, signal_handler_thread; pthread_attr_t thread_attr; + int pipefd; _parse_commandline(argc, argv); @@ -546,11 +547,15 @@ int main(int argc, char **argv) slurmsmwd_print_config(); if (!foreground) { - if (xdaemon()) + pipefd = xdaemon_init(); + if (pipefd == -1) error("daemon(): %m"); } if (create_pidfile("/var/run/slurmsmwd.pid", 0) < 0) fatal("Unable to create pidfile /var/run/slurmswmd.pid"); + if (!foreground) { + xdaemon_finish(pipefd); + } slurm_mutex_init(&down_node_lock); ++++++ split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for.patch ++++++ From 1f12c590038c7f738ff19159629fdc38de5cba82 Mon Sep 17 00:00:00 2001 From: Christian Goll <[email protected]> Date: Mon, 9 Apr 2018 10:05:50 +0200 Subject: [PATCH 1/6] split xdaemon in xdaemon_init and xdaemon_finish for systemd compatibilty --- src/common/daemonize.c | 73 ++++++++++++++++++++++++++++++++++++++++++++------ src/common/daemonize.h | 10 +++++-- 2 files changed, 73 insertions(+), 10 deletions(-) diff --git a/src/common/daemonize.c b/src/common/daemonize.c index e22a1d0a7f..2987a40af0 100644 --- a/src/common/daemonize.c +++ b/src/common/daemonize.c @@ -53,31 +53,75 @@ #include "src/common/xassert.h" /* - * Double-fork and go into background. + * Start daemonization with double-fork and go into background. * Caller is responsible for umasks */ -int xdaemon(void) +int xdaemon_init(void) { - int devnull; - + int fds [2]; + int n; + signed char priority; + char ebuf [1024]; + /* + * Create pipe in order to get signal from grand child to terminate + */ + if (pipe (fds) < 0) { + error("Failed to create daemon pipe"); + } switch (fork()) { case 0 : break; /* child */ case -1 : return -1; - default : _exit(0); /* exit parent */ + default : { + if (close (fds[1]) < 0) { + error("Failed to close write-pipe in parent process"); + } + + /* + * get signal of grandchild to exit + */ + if ((n = read (fds[0], &priority, sizeof (priority))) < 0) { + error("Failed to read status from grandchild process"); + } + if ((n > 0) && (priority >= 0)) { + if ((n = read (fds[0], ebuf, sizeof (ebuf))) < 0) { + error("Failed to read err msg from grandchild process"); + } + if ((n > 0) && (ebuf[0] != '\0')) { + error("Error with forking and steeing up pipe: %s", ebuf); + } + return -1; + } + _exit(0); + } } if (setsid() < 0) return -1; - + if (close (fds[0]) < 0) { + error("Failed to close read-pipe in child process"); + } switch (fork()) { case 0 : break; /* child */ case -1: return -1; default: _exit(0); /* exit parent */ } + return (fds[1]); +} +/* + * finish daemonization after pidfile was written + */ + + +void xdaemon_finish(int fd) +{ /* - * dup stdin, stdout, and stderr onto /dev/null + * PID file was written, now do dup stdin, stdout, + * and stderr onto /dev/null and close pipe + * so that systemd realizes we are daemonized */ + int devnull; + devnull = open("/dev/null", O_RDWR); if (devnull < 0) error("Unable to open /dev/null: %m"); @@ -89,8 +133,21 @@ int xdaemon(void) error("Unable to dup /dev/null onto stderr: %m"); if (close(devnull) < 0) error("Unable to close /dev/null: %m"); + if ((fd >= 0) && (close (fd) < 0)) { + error( "Failed to close write-pipe in grandchild process"); + } +} + +/* + * keep depercated api + */ - return 0; +int xdaemon(void) +{ + int ret_val; + ret_val= xdaemon_init(); + xdaemon_finish(ret_val); + return ret_val; } /* diff --git a/src/common/daemonize.h b/src/common/daemonize.h index 22a31f6ccf..8b2a866b61 100644 --- a/src/common/daemonize.h +++ b/src/common/daemonize.h @@ -41,11 +41,17 @@ #define _HAVE_DAEMONIZE_H /* - * Fork process into background and inherit new session. + * Start fork process into background and inherit new session. * - * Returns -1 on error. */ extern int xdaemon(void); +extern int xdaemon_init(void); + +/* + * Finish daemonization by ending grandparen + */ + +extern void xdaemon_finish(int fd); /* Write pid into file pidfile if uid is not 0 change the owner of the * pidfile to that user. -- 2.13.7 _______________________________________________ openSUSE Commits mailing list -- [email protected] To unsubscribe, email [email protected] List Netiquette: https://en.opensuse.org/openSUSE:Mailing_list_netiquette List Archives: https://lists.opensuse.org/archives/list/[email protected]
