HI, everyone I used blcr-0.7.3 and torque torque-2.4.0-snap.200809111541.tar.gz to test the checkpoint/restart function according to
the wiki: http://www.clusterresources.com/wiki/doku.php?id=torque:2.6_job_checkpoint_and_restart I found an insteresting question, when I qhold the job, I'll see the checkpoint file located at /var/spool/torque/checkpoint/4817.node24.CK/ckpt.4817.node24.1221666102 but when I qrls the same job 4817, the pbs_mom daemon at the compute node will down (killed by something). Any clues? Thank you very much. dolphin ,qin
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
