>There is a Torque command named 'qhold' which will hold running Torque jobs. >Try 'man qhold'. See the output please:
mahmood@server:~$ qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 344.server job1 mahmood 62:51:59 R slow 817.server job2 mahmood 02:13:20 R slow mahmood@server:~$ qhold 817 mahmood@server:~$ qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 344.server job1 mahmood 62:51:59 R slow 817.server job2 mahmood 02:13:51 R slow mahmood@server:~$ qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 344.server job1 mahmood 62:51:59 R slow 817.server job2 mahmood 02:14:36 R slow I ran the last command one minute later and as you can see, still it is running. Also the "top" command shows the process is active. >With qhold/qrls, you're telling a running job >to checkpoint and vacate the node (if supported) and enter the HELD >state, or for a queued job to enter a HELD state and not be scheduled >for execution until it's been qrls'd to the QUEUED state. Seems that it is not supported on my job manager. qhold doesn't have any effect. >mjobctl -s jobID # suspend >mjobctl -r jobID # resume Thanks for that. It really suspend the job: mahmood@server:~$ mjobctl -s 817 job 817 successfully preempted mahmood@server:~$ qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 344.server job1 mahmood 62:51:59 R slow 817.server job2 mahmood 02:12:47 S slow mahmood@server:~$ mjobctl -r 817 cannot resume non-suspended job mahmood@server:~$ qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 344.server job1 mahmood 62:51:59 R slow 817.server job2 mahmood 02:13:20 R slow However I don't know why it says "cannot resume non-suspended job". As you can see, the sate of the job is changed from S to R. // Naderan *Mahmood; ________________________________ From: Steve Johnson <[email protected]> To: [email protected] Cc: maui <[email protected]> Sent: Sat, January 15, 2011 9:16:11 PM Subject: Re: [Mauiusers] SIGSTOP and SIGTSTP don't work The mjobctl command is a scheduler command - it will send STOP/CONT signals to the job and Torque will know about it. In effect, you're doing manual preemption. With qhold/qrls, you're telling a running job to checkpoint and vacate the node (if supported) and enter the HELD state, or for a queued job to enter a HELD state and not be scheduled for execution until it's been qrls'd to the QUEUED state. // Steve On 01/15/2011 11:25 AM, [email protected] wrote: > > Steve> I think what you're looking for is mjobctl. > Steve> mjobctl -s jobID # suspend > Steve> mjobctl -r jobID # resume > > Why is it that Torque and Maui seem to have overlapping commands? What's > the difference between the use of the mjobctl commands you referenced above > and the qhold/qrls jobs of Torque? > _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
