We have a bit simpler setup with only two levels of switches and maui seems to
support that rather well.
In torque one can set node properties that describes what switch each node are
connected to, for instance c3-4 has a property named switch3 and so on:
# pbsnodes c3-4
c3-4
state = job-exclusive
np = 8
properties = ib,switch3,highmem
ntype = cluster
Then we put this in maui.cfg
NODESETPOLICY ONEOF
NODESETATTRIBUTE FEATURE
NODESETDELAY 96:00:00
NODESETLIST switch1 switch2 switch3 switch4 switch5 switch6 switch7
switch8 switch9 switch10 switch11 switch12 switch13 switch14 switch15 switch16
switch17 switch18 switch19 switch20 switch21 switch22 switch23 switch24
switch25 switch26 switch27 switch28 switch29 switch30 switch31 switch32
switch33 switch34 switch35 switch36 switch37 switch38 switch39 switch40
switch41 switch42 switch43 switch44
This will make maui schedule jobs within one switch if it can do so within 96
hours. If you set NODESETDELAY to zero then maui will only schedule within
one switch if it is immediately possible within available resources.
I do not know if it is possible to extend this to deeper levels of switching.
We turned away from this setup as we couldn't measure any difference in
performance for our apps running within one switch or across switches and
rather started scheduling strictly on the type of network the nodes are
connected to. Some of our nodes have infiniband and others only gigabit
ethernet.
NODESETPOLICY ONEOF
NODESETATTRIBUTE FEATURE
NODESETDELAY 96:00:00
NODESETLIST gige ib
This will prevent any job running across the different network types which
solved some issues we saw with open-mpi applications.
Anyway, hope this helps.
Regards,
r.
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers