Hello All-
(I apologize if you receive this email twice. I'm unsure whether it is a
problem in torque, maui, or both and therefore I also posted it to the torque
list).
We're still having trouble with this feature, and we are starting to shop
around for a torque/maui replacement in order to be able to use it. Before we
do that however, I wanted to see if anyone has any thoughts on how to address
the problem within torque/maui. Perhaps I simply don't understand the feature.
The versions of torque and maui we are using are:
torque-3.0.2
maui-3.2.6p21
Yes, we have tried newer versions of maui, but then the option doesn't work at
all.
Here is the scenario (I also included the conversation from November below for
more information).
Conceptually, our software is almost infinitely scalable in the sense that
there is very little overhead associated with interprocess communication.
Therefore, we do not require that all of the processes reside on a small number
of nodes. In fact, we can stretch the processors to any and all nodes in the
cluster with ~zero loss in performance. So we can literally have one node that
has a single process running and another node that has 8 processes running.
Since we have that level of scalability, we don't want to have to lock
ourselves into having to request resources using the "nodes=X:ppn=Y" style
since this style requires that nodes open up or drain in order to use them.
Since our users have a big mixture of single and multi-processor jobs, waiting
for node drain can really waste a lot of resources.
I saw the "procs=#" the Requesting Resources table (see
http://www.clusterresources.com/torquedocs/2.1jobsubmission.shtml#resources for
more). It *appears* that this option should be able to allow the user to
request simply X*Y processors and the scheduler should be able to schedule them
any way it can fit. So using the following #PBS note, we should be able to
request 40 processors:
#PBS -l procs=40
Instead, we see that the scheduler seems to take this information, read it, and
basically disregard it. The reason I know it reads it is because if I ask for
say 40 processors and 40 processors are available in the cluster, it works as
expected and all is right with the world. Where it gets a bit more choppy is
when I ask for 40 processors and only 1 processor is available. The job doesn't
wait in the queue for the remaining 39 processors to open up, and instead PBS
simply just starts the job on that processor. I can't see how that is anything
but a bug. If the user is asking for 40 processors, why isn't the scheduler
waiting for all 40 processors to open up?
If answering this question will require additional information, please ask. We
are at our wits end here.
Thanks!
-Lance
On Nov 18, 2011, at 9:39 AM, Lance Westerhoff wrote:
>
> Hello All-
>
> I submitted the following to the torque list, but the more I look at it, the
> more I think it might be a scheduler problem. It appears that when running
> with the following specs, the procs= option does not actually work as
> expected.
>
> ==========================================
>
> #PBS -S /bin/bash
> #PBS -l procs=60
> #PBS -l pmem=700mb
> #PBS -l walltime=744:00:00
> #PBS -j oe
> #PBS -q batch
>
> torque version: tried 3.0.2. in v2.5.4, I think the procs option worked as
> documented
> maui version: 3.2.6p21 (also tried maui 3.3.1 but it is a complete fail in
> terms of the procs option and it only asks for a single CPU)
>
> ==========================================
>
> If there are fewer then 60 processors available in the cluster (in this case
> there were 53 available) the job will go in an take whatever processors are
> remaining instead of waiting for all 60 processors to free up. Any thoughts
> as to why this might be happening? Sometimes it doesn't really matter and 53
> would be almost as good as 60, however if only 2 processors are available and
> the user asks for 60, I would hate for him to go in.
>
> Thank you for your time!
>
> -Lance
>
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers