Hm, strange. I don't see a problem with the time specs, although I
would use
*/5 * * * *
to run something every 5 minutes. In my scrontab I also specify a
partition, etc. But I don't think that is necessary.
regards
magnus
On Di, 2024-05-07 at 12:06 -0500, Sandor via slurm-users wrote:
> I am
I am working out the details of scrontab. My initial testing is giving me
an unsolvable question
Within scrontab editor I have the following example from the slurm
documentation:
0,5,10,15,20,25,30,35,40,45,50,55 * * * *
/directory/subdirectory/crontest.sh
When I save it, scrontab marks the line
On 5/7/24 15:32, Henderson, Brent via slurm-users wrote:
Over the past few days I grabbed some time on the nodes and ran for a few
hours. Looks like I **can** still hit the issue with cgroups disabled.
Incident rate was 8 out of >11k jobs so dropped an order of magnitude or
so. Guessing
Over the past few days I grabbed some time on the nodes and ran for a few
hours. Looks like I *can* still hit the issue with cgroups disabled. Incident
rate was 8 out of >11k jobs so dropped an order of magnitude or so. Guessing
that exonerates cgroups as the cause, but possibly just a good
Are you seeking something simple rather than sophisticated? If so, you can
use the controller local disk for StateSaveLocation and place a cron job
(on the same node or somewhere else) to take that data out via e.g. rsync
and put it where you need it (NFS?) for the backup control node to use
You can try DRBD
https://linbit.com/drbd/
or a shared-disk (clustered) FS like GFS2, OCFS2, etc
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html-single/configuring_gfs2_file_systems/index
Hi there,
We've updated to 23.11.6 and replaced MUNGE with SACK.
Performance and stability have both been pretty good, but we're
occasionally seeing this in the slurmctld.log
/[2024-05-07T03:50:16.638] error: decode_jwt: token expired at 1715053769
[2024-05-07T03:50:16.638] error:
Hi all,
I am looking for a clean way to set up Slurms native high availability
feature. I am managing a Slurm cluster with one control node (hosting
both slurmctld and slurmdbd), one login node and a few dozen compute
nodes. I have a virtual machine that I want to set up as a backup
control
Tim Wickberg via slurm-users writes:
> [1] Slinky is not an acronym (neither is Slurm [2]), but loosely
> stands for "Slurm in Kubernetes".
And not at all inspired by Slinky Dog in Toy Story, I guess. :D
--
Cheers,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University