Naveed,
I believe that this is controlled by the NODEALLOCATION setting:
http://www.adaptivecomputing.com/resources/docs/maui/5.2nodeallocation.php

<http://www.adaptivecomputing.com/resources/docs/maui/5.2nodeallocation.php>The 
default is LASTAVAILABLE
You may also want to look into node health check scripts to take faulty nodes 
offline automatically, but I know that depending on the problems that you're 
having it may be difficult to write a proper check script.

Good Luck,
Mike Robbert

On Feb 3, 2011, at 10:33 AM, Naveed Near-Ansari wrote:


Is there a way to allocate nodes more randomly.  Currently our jobs seem
to allocate to nodes first added into torque.  This can cause problems
when one node starts having problems.  The jobs keep getting allocated
to the same node, causing the same failures, even when there are
hundreds of other nodes available.  Obviously pulling the node as soon
as possible is the right thing to do, but sometimes this can take a
while (like in the middle of the night when people are working).

I would still like individual jobs sent to the smallest number of nodes
(16 core job on 2 nodes,) but have the nodes assigned in a more random
fashion rather than just the next one available in the list.  I have
read through the documentation and am not finding such an option, but
perhaps i missed, or misunderstood something

Let me know if this should be on the torque list, but i thought maui was
responsible for the allocation.

Naveed
_______________________________________________
mauiusers mailing list
[email protected]<mailto:[email protected]>
http://www.supercluster.org/mailman/listinfo/mauiusers

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to