Hi,
 
 Some of the jobs submitted to torque via GRAM are killed after about
 24 hours in the queue, all with the similar message in globus logs: 
 
 2009-07-10 11:32:16,052 INFO  exec.StateMachine 
[RunQueueThread_5,logJobFailed:3250] Job 74bd3c60-6c17-11de-9a06-9ba1d1ebd14a 
failed. Description: Couldn't obtain a delegated credential. Cause: 
org.globus.exec.generated.FaultType: Couldn't obtain a delegated credential. 
caused by [0: org.oasis.wsrf.faults.BaseFaultType: Error getting delegation 
resource [Caused by: org.globus.wsrf.NoSuchResourceException]]
 
 torque reports exit status = 271 (exceeds resource limit or killed by
 user), none of the "problematic" jobs seem to exceed any
 limits. Moreover we had a lot of jobs that run for longer then 24 hours
 and completed successfully (sometimes users just re-submitted jobs
 with the same description and using exactly the same tools and it
 completed without any problems). 
 
 All problematic jobs were submitted with globusrun-ws tool 
 
 Could anyone explain what is going on here? 
 
 
 Currently we use globus version from VDT 1.10, started with VDT 1.6 
 From looking in logs, we  had the same problem for over a year, but not
 many people are affected and most just re-submit without
 reporting. 
 
 Cheers,
 Yuriy
 

Reply via email to