Good day all,

 

I have install torque/maui for an opportunistic grid based on gLite 3.2. I
have some problems when a node that is running a job crashes. After a few
moments it seems that PBS realizes that the node is “down” and that the job
can’t continue BUT it doesn´t re-launch it. The job still appears like
Running of the “down” resource!!! I´ve tried to re-launch or reallocate it
manually but the scheduler doesn´t let me.  When I cancel it by force the
jobs stays between the state “Deffered” or “BatchHold” and no releasehold or
qrls command changes it. 

I´ve tried to find a way to configure via Torque or MAUI the parameter to
relaunch a job that is allocated on a “down” resource but I only found MOAB
parameters. 

Can you help me out? I would appreciate very much your help,

 

Nathalia

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to