Re: [Mauiusers] Resources problem : cannot select job 62 for partition DEFAULT (job hold active)

Daniel Boone Tue, 15 May 2007 23:51:22 -0700

I tried some new parameters.

print server output of qmgr
----------------
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.mem = 2000mb
set queue batch resources_default.nodes = 1
set queue batch resources_default.pvmem = 16000mb
set queue batch resources_default.walltime = 06:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server managers = [EMAIL PROTECTED]
set server operators = [EMAIL PROTECTED]
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server pbs_version = 2.1.8
----------------------
checkjob output:
----------------------
checking job 90 (RM job '90.em-research00')

State: Idle  EState: Deferred
Creds:  user:abaqus  group:users  class:batch  qos:DEFAULT
WallTime: 00:00:00 of 5:00:00
SubmitTime: Tue May 15 11:59:03
  (Time Queued  Total: 1:58:17  Eligible: 00:00:00)

Total Tasks: 4

Req[0]  TaskCount: 4  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 15G
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Exec:  ''  ExecSize: 0  ImageSize: 0
Dedicated Resources Per Task: PROCS: 1  MEM: 250M  SWAP: 15G
NodeAccess: SHARED
TasksPerNode: 2  NodeCount: 2


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [ALL]
SystemQueueTime: Tue May 15 13:00:06

Flags:       RESTARTABLE

job is deferred.  Reason:  NoResources  (cannot create reservation for
job '90' (intital reservation attempt)
)
Holds:    Defer  (hold reason:  NoResources)
PE:  6.07  StartPriority:  57
cannot select job 90 for partition DEFAULT (job hold active)
-------------------
pbs-script:
-------------------

#!/bin/bash
#PBS -l nodes=2:ppn=2
#PBS -l walltime=05:00:00
#PBS -l mem=1000mb
#PBS -l vmem=7000mb
#PBS -j oe
#PBS -M [EMAIL PROTECTED]
#PBS -m bae
# Go to the directory from which you submitted the job
mkdir $PBS_O_WORKDIR
string="$PBS_O_WORKDIR/plus2gb.inp"

scp 10.1.0.52:$string $PBS_O_WORKDIR

cd $PBS_O_WORKDIR
#module load abaqus
#
/Apps/abaqus/Commands/abaqus job=plus2gb queue=abaqus4cpu
input=Standard_plus2gbyte.inp cpus=4
---------------------------
abaqus environment file.
--------------------------
import os
os.environ['LAMRSH'] = 'ssh'

max_cpus=6

mp_host_list=[['em-research00',3],['10.1.0.97',2]]


run_mode = BATCH
scratch  = "/home/abaqus"

queue_name=["cpu","abaqus4cpu"]
queue_cmd="qsub -r n -q batch -S /bin/bash -V -l nodes=1:ppn=1 %S"
cpu="qsub -r n -q batch -S /bin/bash -V -l nodes=1:ppn=2 %S"
abaqus4cpu="qsub -r n -q batch -S /bin/bash -V -l nodes=2:ppn=2 %S"

pre_memory = "3000 mb"
standard_memory = "7000 mb"

---------------------------
but still no changes

thanks for al the help until now.
rishi pathak schreef:
> Also try in your job script file
> PBS -l pvmem=<amount of virtual memory>
>
> On 5/15/07, *rishi pathak* <[EMAIL PROTECTED]
> <mailto:[EMAIL PROTECTED]>> wrote:
>
>     I did not see any specific queue in th submit script
>     have you specified the following for the queue you are using
>
>     resources_default.mem #available ram 
>     resources_default.pvmem #virtual memory
>
>         
>
>
>
>     On 5/15/07, *Daniel Boone* <[EMAIL PROTECTED]
>     <mailto:[EMAIL PROTECTED]>> wrote:
>
>         Hi
>
>         I need to use the swap. I know I don't have enough RAM, but
>         the job must
>         be able to run. Even if it swaps a lot.
>         Time is not an issue here.
>         On 1 machine the job uses about 7.4GB swap. We don't have any
>         other
>         machines with more RAM to run it on.
>         Otherwise the other option is to run the job outside
>         torque/maui, but I
>         rather don't do that.
>
>         Can some tell me how to read the checkjob -v output, because I
>         don't
>         understand how to find errors in it.
>
>         rishi pathak schreef:
>         > Hi
>         > system memory(RAM) available to per process is less than the
>         requested
>         > amount
>         > It is not considering swap as an extention of RAM
>         > Try with reduced system memory
>         >
>         >
>         >
>         > On 5/14/07, *Daniel Boone* <[EMAIL PROTECTED]
>         <mailto:[EMAIL PROTECTED]>
>         > <mailto: [EMAIL PROTECTED]
>         <mailto:[EMAIL PROTECTED]>>> wrote:
>         >
>         >     Hi
>         >
>         >     I'm having the following problem. When I submit a very
>         >     memory-intensive(most swap) job, the job doesn't want to
>         start.
>         >     It gives the error: cannot select job 62 for partition
>         DEFAULT
>         >     (job hold
>         >     active)
>         >     But I don't understand what the error means.
>         >
>         >     I run torque 2.1.8 with maui maui-3.2.6p19
>         >
>         >     checkjob -v returns the following:
>         >     -------------------
>         >     checking job 62 (RM job '62.em-research00')
>         >
>         >     State: Idle  EState: Deferred
>         >     Creds:  user:abaqus  group:users  class:batch  qos:DEFAULT
>         >     WallTime: 00:00:00 of 6:00:00
>         >     SubmitTime: Mon May 14 14:13:41
>         >     (Time Queued  Total: 1:53:39  Eligible: 00:00:00)
>         >
>         >     Total Tasks: 4
>         >
>         >     Req[0]  TaskCount: 4  Partition: ALL
>         >     Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
>         >     Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
>         >     Exec:  ''  ExecSize: 0  ImageSize: 0
>         >     Dedicated Resources Per Task: PROCS: 1  MEM: 3875M
>         >     NodeAccess: SHARED
>         >     TasksPerNode: 2  NodeCount: 2
>         >
>         >
>         >     IWD: [NONE]  Executable:  [NONE]
>         >     Bypass: 0  StartCount: 0
>         >     PartitionMask: [ALL]
>         >     SystemQueueTime: Mon May 14 15:14:13
>         >
>         >     Flags:       RESTARTABLE
>         >
>         >     job is deferred.  Reason:  NoResources  (cannot create
>         reservation for
>         >     job '62' (intital reservation attempt)
>         >     )
>         >     Holds:    Defer  (hold reason:  NoResources)
>         >     PE:  19.27  StartPriority:  53
>         >     cannot select job 62 for partition DEFAULT (job hold active)
>         >     ------------------------
>         >     checknode of the two nodes:checking node em-research00
>         >     ------------
>         >     State:      Idle  (in current state for 2:31:21)
>         >     Configured Resources: PROCS: 3  MEM: 2010M  SWAP:
>         33G  DISK: 72G
>         >
>         >
>         >     Utilized   Resources: DISK: 9907M
>         >     Dedicated  Resources: [NONE]
>         >     Opsys:         linux  Arch:      [NONE]
>         >     Speed:      1.00  Load:       0.000
>         >     Network:    [DEFAULT]
>         >     Features:   [F]
>         >     Attributes: [Batch]
>         >     Classes:    [batch 3:3]
>         >
>         >     Total Time: 2:29:18  Up: 2:29:18 (100.00%)  Active:
>         00:00:00 (0.00% )
>         >
>         >     Reservations:
>         >     NOTE:  no reservations on node
>         >
>         >     --------------------
>         >     State:      Idle  (in current state for 2:31:52)
>         >     Configured Resources: PROCS: 2  MEM: 2012M  SWAP:
>         17G  DISK: 35G
>         >     Utilized   Resources: DISK: 24G
>         >     Dedicated  Resources: [NONE]
>         >     Opsys:         linux  Arch:      [NONE]
>         >     Speed:      1.00  Load:       0.590
>         >     Network:    [DEFAULT]
>         >     Features:   [NONE]
>         >     Attributes: [Batch]
>         >     Classes:    [batch 2:2]
>         >
>         >     Total Time: 2:29:49  Up: 2:29:49 ( 100.00%)  Active:
>         00:00:00 ( 0.00%)
>         >
>         >     Reservations:
>         >     NOTE:  no reservations on node
>         >     -----------------
>         >     The pbs scipt I'm using:
>         >     #!/bin/bash
>         >     #PBS -l nodes=2:ppn=2
>         >     #PBS -l walltime=06:00:00
>         >     #PBS -l mem=15500mb
>         >     #PBS -j oe
>         >     # Go to the directory from which you submitted the job
>         >     mkdir $PBS_O_WORKDIR
>         >     string="$PBS_O_WORKDIR/plus2gb.inp"
>         >     scp 10.1.0.52:$string $PBS_O_WORKDIR
>         >     #scp 10.1.0.52:$PBS_O_WORKDIR'/'$PBS_JOBNAME ./
>         >     cd $PBS_O_WORKDIR
>         >     #module load abaqus
>         >     #
>         >     /Apps/abaqus/Commands/abaqus job=plus2gb queue=cpu2
>         >     input=Standard_plus2gbyte.inp cpus=4 mem=15000mb
>         >     ---------------------------
>         >     If you need some extra info please let me know.
>         >
>         >     Thank you
>         >
>         >     _______________________________________________
>         >     mauiusers mailing list
>         >     [email protected]
>         <mailto:[email protected]> <mailto:
>         [email protected] <mailto:[email protected]>>
>         >     http://www.supercluster.org/mailman/listinfo/mauiusers
>         >
>         >
>         >
>         >
>         > --
>         > Regards--
>         > Rishi Pathak
>         > National PARAM Supercomputing Facility
>         > Center for Development of Advanced Computing(C-DAC)
>         > Pune University Campus,Ganesh Khind Road
>         > Pune-Maharastra
>
>
>
>
>     -- 
>     Regards--
>     Rishi Pathak
>     National PARAM Supercomputing Facility
>     Center for Development of Advanced Computing(C-DAC)
>     Pune University Campus,Ganesh Khind Road
>     Pune-Maharastra
>
>
>
>
> -- 
> Regards--
> Rishi Pathak
> National PARAM Supercomputing Facility
> Center for Development of Advanced Computing(C-DAC)
> Pune University Campus,Ganesh Khind Road
> Pune-Maharastra 


_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
Re: [Mauiusers] Resources problem : cannot select job 62 for partition DEFAULT (job hold active)

Reply via email to