I'm checking with slurm the exact "task_per_node" and "cpus_per_task"
behavior, for present and future Slurm version.
Ralph Castain a écrit :
We've gone around on this one a few times too. We finally settled on the
current formula and confirmed it did what the slurm folks expected, so I'm
somewhat loath to change it given that situation.
I suggest you take it up with the slurm folks to find out what behavior is
expected when tasks_per_node and cpus_per_task are set. How many application
processes are expected to be run on the node?
Part of the problem (as I recall) was that the meaning of tasks_per_node changed across a slurm
release. At one time, it actually meant "cpus_per_node", and so you had to do the
division to get the ppn correct. I'm not sure what it means today, but since Livermore writes
slurm and the folks there seem to be happy with the way this behaves...<shrug>
Let me know what you find out.
On Feb 26, 2010, at 9:45 AM, Damien Guinier wrote:
Hi Ralph,
I find a minor bug on the MCA composent: ras slurm.
This one have an incorrect comportement with the "X number of processors per
task" feature.
On the file orte/mca/ras/slurm/ras_slurm_module.c, line 356:
- The node slot number is divide with "cpus_per_task" information,
but "cpus_per_task" information is already include by the line 285.
My proposition is to not divide the node slot number the seconde time.
My patch is :
diff -r ef9d639ab011 -r 8f62269014c2 orte/mca/ras/slurm/ras_slurm_module.c
--- a/orte/mca/ras/slurm/ras_slurm_module.c Wed Jan 20 18:29:12 2010 +0100
+++ b/orte/mca/ras/slurm/ras_slurm_module.c Thu Feb 25 15:59:41 2010 +0100
@@ -353,7 +353,8 @@
node->state = ORTE_NODE_STATE_UP;
node->slots_inuse = 0;
node->slots_max = 0;
- node->slots = slots[i] / cpus_per_task;
+ node->slots = slots[i];
opal_list_append(nodelist, &node->super);
}
free(slots);
Tanks
Damien