Good day all,
I have install torque/maui for an opportunistic grid based on gLite 3.2. I have some problems when a node that is running a job crashes. After a few moments it seems that PBS realizes that the node is down and that the job cant continue BUT it doesn´t re-launch it. The job still appears like Running of the down resource!!! I´ve tried to re-launch or reallocate it manually but the scheduler doesn´t let me. When I cancel it by force the jobs stays between the state Deffered or BatchHold and no releasehold or qrls command changes it. I´ve tried to find a way to configure via Torque or MAUI the parameter to relaunch a job that is allocated on a down resource but I only found MOAB parameters. Can you help me out? I would appreciate very much your help, Nathalia
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
