Le 21 juil. 2011 à 00:59, Caleb Phillips a écrit :
> Hello all:
>
> I'm running torque 2.3.6 (packaged with Ubuntu 10.10) and maui 3.3.1.
> I'm having an issue where submitted jobs sit in the queue indefinitely.
> This was occurring with pbs_sched, so I installed maui hoping it would
> fix the problem. With maui, I have more information about the problem,
> but no resolution. I've spent several hours searching the torqueusers
> and mauiusers mailing lists, and reading the manuals, to no avail. I
> hope you can help...
>
> As far as I can tell, maui is complaining that there are not sufficient
> "feasible procs" for jobs to run because of a lack of "features". My
> nodes have no features enabled, and I'm not requesting any with my jobs.
> Yet, the jobs show up with "[1][ppn=1]" in the feature list. I don't
> know where these features are coming from or how to unset them, or if
> that's really the source of the problem (it's simply my best guess). Any
> ideas?
>
> Here's more information on my setup and how I reproduce the problem:
>
> I have one node (currently online). It has 48 processors:
>
>> caleb@torqueserver:~$ qnodes
>> fu48core.esl
>> state = free
>> np = 48
>> ntype = cluster
>> status = opsys=linux,uname=Linux 48core 2.6.32-25-server #45-Ubuntu SMP
>> Sat Oct 16 20:06:58 UTC 2010 x86_64,sessions=2834 5874 12296 13555 19465
>> 17575,nsessions=6,nusers=3,idletime=2308,totmem=82007668kb,availmem=73380372kb,physmem=82007668kb,ncpus=48,loadave=2.19,netload=24944834533,state=free,jobs=,varattr=,rectime=1311202191
>
> It's free and presumably happy:
>
>> caleb@torqueserver:/usr/local/maui$ checknode fu48core
>>
>> checking node fu48core.esl
>>
>> State: Idle (in current state for 5:15:40)
>> Configured Resources: PROCS: 48 MEM: 78G SWAP: 78G DISK: 1M
>> Utilized Resources: SWAP: 8426M
>> Dedicated Resources: [NONE]
>> Opsys: linux Arch: [NONE]
>> Speed: 1.00 Load: 2.240
>> Network: [DEFAULT]
>> Features: [NONE]
>> Attributes: [Batch]
>> Classes: [batch 48:48][amplhack 48:48][qualnet 48:48][lightweight 48:48]
>>
>> Total Time: 6:19:49 Up: 6:19:49 (100.00%) Active: 00:00:00 (0.00%)
>>
>> Reservations:
>> NOTE: no reservations on node
>
> The batch queue is empty. If I submit a very basic job (I've tried more
> complicated jobs too, with specific resource requests), it gets deferred
> immediately:
>
>> caleb@torqueserver:/usr/local/maui$ echo "sleep 30" | qsub
>> 25.torqueserver.esl
>> caleb@torqueserver:/usr/local/maui$ checkjob 25
>> checking job 25
>>
>> State: Idle EState: Deferred
>> Creds: user:caleb group:abelian class:batch qos:DEFAULT
>> WallTime: 00:00:00 of 1:00:00:00
>> SubmitTime: Wed Jul 20 16:52:37
>> (Time Queued Total: 00:00:31 Eligible: 00:00:00)
>>
>> Total Tasks: 1
>>
>> Req[0] TaskCount: 1 Partition: ALL
>> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
>> Opsys: [NONE] Arch: [NONE] Features: [1][ppn=1]
>> NodeCount: 1
>>
>> IWD: [NONE] Executable: [NONE]
>> Bypass: 0 StartCount: 0
>> PartitionMask: [ALL]
>> Flags: RESTARTABLE
>>
>> job is deferred. Reason: NoResources (cannot create reservation for job
>> '25' (intital reservation attempt)
>> )
>> Holds: Defer (hold reason: NoResources)
>> PE: 1.00 StartPriority: 1
>> cannot select job 25 for partition DEFAULT (job hold active)
>
> If I release the job, I can see that maui's complaining about a lack of
> feasible procs due to unavailable features:
Could it be your default queue in Torque sets some odd feature not present on
any of your nodes ?
R. David
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers