Re: [Mauiusers] Resources problem : cannot select job 62 for partition DEFAULT (job hold active)

Daniel Boone Tue, 15 May 2007 23:50:07 -0700

Because I forgot to sent this mail to the mailing list, I'm forwarding
every mail regarding this issue



Also try in your job script file
PBS -l pvmem=<amount of virtual memory>

On 5/15/07, *rishi pathak* <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>> wrote:

    I did not see any specific queue in th submit script
    have you specified the following for the queue you are using

    resources_default.mem #available ram 
    resources_default.pvmem #virtual memory

      



    On 5/15/07, *Daniel Boone* <[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>> wrote:

        Hi

        I need to use the swap. I know I don't have enough RAM, but the
        job must
        be able to run. Even if it swaps a lot.
        Time is not an issue here.
        On 1 machine the job uses about 7.4GB swap. We don't have any other
        machines with more RAM to run it on.
        Otherwise the other option is to run the job outside
        torque/maui, but I
        rather don't do that.

        Can some tell me how to read the checkjob -v output, because I
        don't
        understand how to find errors in it.

        rishi pathak schreef:
        > Hi
        > system memory(RAM) available to per process is less than the
        requested
        > amount
        > It is not considering swap as an extention of RAM
        > Try with reduced system memory
        >
        >
        >
        > On 5/14/07, *Daniel Boone* <[EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>
        > <mailto: [EMAIL PROTECTED]
        <mailto:[EMAIL PROTECTED]>>> wrote:
        >
        >     Hi
        >
        >     I'm having the following problem. When I submit a very
        >     memory-intensive(most swap) job, the job doesn't want to
        start.
        >     It gives the error: cannot select job 62 for partition DEFAULT
        >     (job hold
        >     active)
        >     But I don't understand what the error means.
        >
        >     I run torque 2.1.8 with maui maui-3.2.6p19
        >
        >     checkjob -v returns the following:
        >     -------------------
        >     checking job 62 (RM job '62.em-research00')
        >
        >     State: Idle  EState: Deferred
        >     Creds:  user:abaqus  group:users  class:batch  qos:DEFAULT
        >     WallTime: 00:00:00 of 6:00:00
        >     SubmitTime: Mon May 14 14:13:41
        >     (Time Queued  Total: 1:53:39  Eligible: 00:00:00)
        >
        >     Total Tasks: 4
        >
        >     Req[0]  TaskCount: 4  Partition: ALL
        >     Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
        >     Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
        >     Exec:  ''  ExecSize: 0  ImageSize: 0
        >     Dedicated Resources Per Task: PROCS: 1  MEM: 3875M
        >     NodeAccess: SHARED
        >     TasksPerNode: 2  NodeCount: 2
        >
        >
        >     IWD: [NONE]  Executable:  [NONE]
        >     Bypass: 0  StartCount: 0
        >     PartitionMask: [ALL]
        >     SystemQueueTime: Mon May 14 15:14:13
        >
        >     Flags:       RESTARTABLE
        >
        >     job is deferred.  Reason:  NoResources  (cannot create
        reservation for
        >     job '62' (intital reservation attempt)
        >     )
        >     Holds:    Defer  (hold reason:  NoResources)
        >     PE:  19.27  StartPriority:  53
        >     cannot select job 62 for partition DEFAULT (job hold active)
        >     ------------------------
        >     checknode of the two nodes:checking node em-research00
        >     ------------
        >     State:      Idle  (in current state for 2:31:21)
        >     Configured Resources: PROCS: 3  MEM: 2010M  SWAP:
        33G  DISK: 72G
        >
        >
        >     Utilized   Resources: DISK: 9907M
        >     Dedicated  Resources: [NONE]
        >     Opsys:         linux  Arch:      [NONE]
        >     Speed:      1.00  Load:       0.000
        >     Network:    [DEFAULT]
        >     Features:   [F]
        >     Attributes: [Batch]
        >     Classes:    [batch 3:3]
        >
        >     Total Time: 2:29:18  Up: 2:29:18 (100.00%)  Active:
        00:00:00 (0.00% )
        >
        >     Reservations:
        >     NOTE:  no reservations on node
        >
        >     --------------------
        >     State:      Idle  (in current state for 2:31:52)
        >     Configured Resources: PROCS: 2  MEM: 2012M  SWAP:
        17G  DISK: 35G
        >     Utilized   Resources: DISK: 24G
        >     Dedicated  Resources: [NONE]
        >     Opsys:         linux  Arch:      [NONE]
        >     Speed:      1.00  Load:       0.590
        >     Network:    [DEFAULT]
        >     Features:   [NONE]
        >     Attributes: [Batch]
        >     Classes:    [batch 2:2]
        >
        >     Total Time: 2:29:49  Up: 2:29:49 ( 100.00%)  Active:
        00:00:00 ( 0.00%)
        >
        >     Reservations:
        >     NOTE:  no reservations on node
        >     -----------------
        >     The pbs scipt I'm using:
        >     #!/bin/bash
        >     #PBS -l nodes=2:ppn=2
        >     #PBS -l walltime=06:00:00
        >     #PBS -l mem=15500mb
        >     #PBS -j oe
        >     # Go to the directory from which you submitted the job
        >     mkdir $PBS_O_WORKDIR
        >     string="$PBS_O_WORKDIR/plus2gb.inp"
        >     scp 10.1.0.52:$string $PBS_O_WORKDIR
        >     #scp 10.1.0.52:$PBS_O_WORKDIR'/'$PBS_JOBNAME ./
        >     cd $PBS_O_WORKDIR
        >     #module load abaqus
        >     #
        >     /Apps/abaqus/Commands/abaqus job=plus2gb queue=cpu2
        >     input=Standard_plus2gbyte.inp cpus=4 mem=15000mb
        >     ---------------------------
        >     If you need some extra info please let me know.
        >
        >     Thank you
        >
        >     _______________________________________________
        >     mauiusers mailing list
        >     [email protected]
        <mailto:[email protected]> <mailto:
        [email protected] <mailto:[email protected]>>
        >     http://www.supercluster.org/mailman/listinfo/mauiusers
        >
        >
        >
        >
        > --
        > Regards--
        > Rishi Pathak
        > National PARAM Supercomputing Facility
        > Center for Development of Advanced Computing(C-DAC)
        > Pune University Campus,Ganesh Khind Road
        > Pune-Maharastra




    -- 
    Regards--
    Rishi Pathak
    National PARAM Supercomputing Facility
    Center for Development of Advanced Computing(C-DAC)
    Pune University Campus,Ganesh Khind Road
    Pune-Maharastra




-- 
Regards--
Rishi Pathak
National PARAM Supercomputing Facility
Center for Development of Advanced Computing(C-DAC)
Pune University Campus,Ganesh Khind Road
Pune-Maharastra
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Re: [Mauiusers] Resources problem : cannot select job 62 for partition DEFAULT (job hold active)

Reply via email to