Thanks Ronny. Good point. On the other hand pre-emption still
wouldn't work without it since anyone else submitting to any other
queue wouldn't even get scheduled since MAUI couldn't get through the
whole list of those thousands of jobs (each iteration). By doing this
set up I get those thousands of jobs out of MAUI so that it can still
process/schedule the other queue's properly. Also, you are right, I
do turn it into a "50 jobs at a time" batching system... but only for
that 1 queue. All the other queue's operate as expected. They don't
normally have submissions of thousands of jobs at a time.
On Jun 26, 2008, at 9:06 AM, Ronny T. Lampert wrote:
I had a similar situation with a user submitting 7,000 jobs at a
time. Like you point out maui can't seem to keep up with
scheduling all of them. After posting to the list it was suggested
that I create a routing queue in torque:
create queue physics
set queue physics queue_type = Route
set queue physics acl_group_enable = True
set queue physics route_destinations = pompeii
set queue physics enabled = True
set queue physics started = True
Then for the destination queue pompeii I put in the following rule:
set queue pompeii max_queuable = 50
This setup is working well. Torque manages to keep 50 jobs in the
pompeii execution queue at all times. Maui is happy since it
doesn't have to go through thousands of jobs each iteration, which
it couldn't run anyhow due to lack of resources. (I wish we had
thousands ;-)).
Please note that ANY! newer jobs that might trigger preemption will
NO LONGER WORK with this setup, since maui is only using its
scheduling algorithms on those 50 jobs.
Same with higher prio jobs or similar that will/must/should be
executed ASAP.
You essentially turn your setup into a "50 jobs a at time" batching
system.
So, depending on your needs you should increase the max_queueable.
Before maui I managed to run a heavily patched pbs_sched (early
torque releases) with I think around 20k+ jobs queued.
After that I abandoned that setup because I needed preemption
(sorry, no docs left from that time).
I had maui running with 10k+ jobs (and changed the #define so it
would consider 8K instead of 4K jobs for real scheduling), but it's
not nice and it'll eat memory like it's sugar (500MB+ RSS).
And I still think scheduling over 8K jobs is far too less for such
a system.
Because ATM I no longer have this setup in operation I did stop
working privately on maui to remedy those shortcommings.
BR,
Ronny
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers