Hi,
Some of the jobs submitted to torque via GRAM are killed after about
24 hours in the queue, all with the similar message in globus logs:
2009-07-10 11:32:16,052 INFO exec.StateMachine
[RunQueueThread_5,logJobFailed:3250] Job 74bd3c60-6c17-11de-9a06-9ba1d1ebd14a
failed. Description: Couldn't obtain a delegated credential. Cause:
org.globus.exec.generated.FaultType: Couldn't obtain a delegated credential.
caused by [0: org.oasis.wsrf.faults.BaseFaultType: Error getting delegation
resource [Caused by: org.globus.wsrf.NoSuchResourceException]]
torque reports exit status = 271 (exceeds resource limit or killed by
user), none of the "problematic" jobs seem to exceed any
limits. Moreover we had a lot of jobs that run for longer then 24 hours
and completed successfully (sometimes users just re-submitted jobs
with the same description and using exactly the same tools and it
completed without any problems).
All problematic jobs were submitted with globusrun-ws tool
Could anyone explain what is going on here?
Currently we use globus version from VDT 1.10, started with VDT 1.6
From looking in logs, we had the same problem for over a year, but not
many people are affected and most just re-submit without
reporting.
Cheers,
Yuriy