Joseph Bester wrote:
On Oct 20, 2008, at 7:40 PM, John Sanabria wrote:
Hi,
I'm developing a platform for executing jobs using traditional Globus
commands such as 'globus-job-run' and 'globus-job-submit'. Now, when
a user decides to make an asynchronous execution, the platform
queries periodically for the job status to the remote resource using
the 'globus-job-status' command.
I'm executing tasks lasting more than 5 days.
I have noted that approximately one day or less after I start the
execution, the 'globus-job-status' command returns 'DONE' but the job
hasn't finished.
Is this normal behavior? I read the paper 'The Gridway Framework For
Adaptive Scheduling And Execution Grids' and I found this:
"The job manager is probed periodically at each polling. If the job
manager does not respond, the GRAM gatekeeper is probed. If the
gatekeeper responds, a new job manager is started to resume watching
over the job. If the gatekeeper fails to respond..."
According that, I think that this behavior is not abnormal, but I
don't know how to query the GRAM gatekeeper and what message send to
it for requesting that it starts a new job manager for watching a job.
I appreciate your comments, advice and pointers to documentation
about this topic.
Cheers,
I wonder if the proxy you have delegated to the GRAM is expiring after
the day? Are you creating a proxy with a long enough lifetime to last
for the whole jobs?
Joe
Hi Joe,
yes I did, the credential will expire on June 2009.
John