>There is a Torque command named 'qhold' which will hold running Torque jobs.
>Try 'man qhold'.
See the output please:

mahmood@server:~$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
344.server                   job1            mahmood         62:51:59 R slow    
817.server                   job2     mahmood         02:13:20 R slow      
  

mahmood@server:~$ qhold 817

 
mahmood@server:~$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
344.server                   job1            mahmood         62:51:59 R 
slow        

817.server                   job2     mahmood         02:13:51 R slow        

 
mahmood@server:~$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
344.server                   job1            mahmood         62:51:59 R 
slow        

817.server                   job2     mahmood         02:14:36 R slow   
     


I ran the last command one minute later and as you can see, still it is 
running. 
Also the "top" command shows the process is active.

>With qhold/qrls, you're telling a running job 
>to checkpoint and vacate the node (if supported) and enter the HELD 
>state, or for a queued job to enter a HELD state and not be scheduled 
>for execution until it's been qrls'd to the QUEUED state.
Seems that it is not supported on my job manager. qhold doesn't have any effect.

 
>mjobctl -s jobID  # suspend
>mjobctl -r jobID  # resume
Thanks for that. It really suspend the job:

mahmood@server:~$ mjobctl -s 817
job 817 successfully preempted
 
mahmood@server:~$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
344.server                   job1            mahmood         62:51:59 R slow    
817.server                   job2     mahmood         02:12:47 S slow    

mahmood@server:~$ mjobctl -r 817
cannot resume non-suspended job

mahmood@server:~$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
344.server                   job1            mahmood         62:51:59 R slow    
817.server                   job2      mahmood         02:13:20 R slow    

However I don't know why it says "cannot resume non-suspended job". As you can 
see, the sate of the job is changed from S to R.
 
// Naderan *Mahmood;




________________________________
From: Steve Johnson <[email protected]>
To: [email protected]
Cc: maui <[email protected]>
Sent: Sat, January 15, 2011 9:16:11 PM
Subject: Re: [Mauiusers] SIGSTOP and SIGTSTP don't work

The mjobctl command is a scheduler command - it will send STOP/CONT 
signals to the job and Torque will know about it.  In effect, you're 
doing manual preemption.  With qhold/qrls, you're telling a running job 
to checkpoint and vacate the node (if supported) and enter the HELD 
state, or for a queued job to enter a HELD state and not be scheduled 
for execution until it's been qrls'd to the QUEUED state.

// Steve

On 01/15/2011 11:25 AM, [email protected] wrote:
>
>      Steve>  I think what you're looking for is mjobctl.
>      Steve>  mjobctl -s jobID  # suspend
>      Steve>  mjobctl -r jobID  # resume
>
> Why is it that Torque and Maui seem to have overlapping commands?  What's
> the difference between the use of the mjobctl commands you referenced above
> and the qhold/qrls jobs of Torque?
>
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers



      
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to