Hello All-

(I apologize if you receive this email twice. I'm unsure whether it is a 
problem in torque, maui, or both and therefore I also posted it to the torque 
list).

We're still having trouble with this feature, and we are starting to shop 
around for a torque/maui replacement in order to be able to use it. Before we 
do that however, I wanted to see if anyone has any thoughts on how to address 
the problem within torque/maui. Perhaps I simply don't understand the feature. 
The versions of torque and maui we are using are:

        torque-3.0.2
        maui-3.2.6p21

Yes, we have tried newer versions of maui, but then the option doesn't work at 
all.

Here is the scenario (I also included the conversation from November below for 
more information). 

Conceptually, our software is almost infinitely scalable in the sense that 
there is very little overhead associated with interprocess communication. 
Therefore, we do not require that all of the processes reside on a small number 
of nodes. In fact, we can stretch the processors to any and all nodes in the 
cluster with ~zero loss in performance. So we can literally have one node that 
has a single process running and another node that has 8 processes running. 
Since we have that level of scalability, we don't want to have to lock 
ourselves into having to request resources using the "nodes=X:ppn=Y" style 
since this style requires that nodes open up or drain in order to use them. 
Since our users have a big mixture of single and multi-processor jobs, waiting 
for node drain can really waste a lot of resources.

I saw the "procs=#" the Requesting Resources table (see 
http://www.clusterresources.com/torquedocs/2.1jobsubmission.shtml#resources for 
more). It *appears* that this option should be able to allow the user to 
request simply X*Y processors and the scheduler should be able to schedule them 
any way it can fit. So using the following #PBS note, we should be able to 
request 40 processors:

#PBS -l procs=40

Instead, we see that the scheduler seems to take this information, read it, and 
basically disregard it. The reason I know it reads it is because if I ask for 
say 40 processors and 40 processors are available in the cluster, it works as 
expected and all is right with the world. Where it gets a bit more choppy is 
when I ask for 40 processors and only 1 processor is available. The job doesn't 
wait in the queue for the remaining 39 processors to open up, and instead PBS 
simply just starts the job on that processor. I can't see how that is anything 
but a bug. If the user is asking for 40 processors, why isn't the scheduler 
waiting for all 40 processors to open up? 

If answering this question will require additional information, please ask. We 
are at our wits end here.

Thanks!

-Lance


On Nov 18, 2011, at 9:39 AM, Lance Westerhoff wrote:

> 
> Hello All-
> 
> I submitted the following to the torque list, but the more I look at it, the 
> more I think it might be a scheduler problem. It appears that when running 
> with the following specs, the procs= option does not actually work as 
> expected.
> 
> ==========================================
> 
> #PBS -S /bin/bash
> #PBS -l procs=60
> #PBS -l pmem=700mb
> #PBS -l walltime=744:00:00
> #PBS -j oe
> #PBS -q batch
> 
> torque version: tried 3.0.2. in v2.5.4, I think the procs option worked as 
> documented
> maui version: 3.2.6p21 (also tried maui 3.3.1 but it is a complete fail in 
> terms of the procs option and it only asks for a single CPU)
> 
> ==========================================
> 
> If there are fewer then 60 processors available in the cluster (in this case 
> there were 53 available) the job will go in an take whatever processors are 
> remaining instead of waiting for all 60 processors to free up. Any thoughts 
> as to why this might be happening? Sometimes it doesn't really matter and 53 
> would be almost as good as 60, however if only 2 processors are available and 
> the user asks for 60, I would hate for him to go in.
> 
> Thank you for your time!
> 
> -Lance
> 

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to