Hi all,
I've the situation in which I submit a GramJob to a SGE jobmanager and
run it in a specific directory (that I create just before submitting the
job). Sometimes I want to cancel the job and therefore I call the method
GramJob.cancel(). Directly after this call I remove the directory the
job did run in. This leads sometimes to the situation that indeed the
directory is deleted, but the job keeps running. The logging of the sge
jobmanager tells me this:
03/06/2008 17:23:28|qmaster|fs0|W|job 176632.1 failed on host xxx
general opening input/output file because: 03/06/2008 17:23:28
[1001:23735]: error: can't open output file
"xxx/17249.1204820556/stdout": Stale NFS file handle
The stale NFS file handle is probably the reason that the job isn't
properly cancelled, and I understand that the cancellation takes some
time and that I have to wait for it before deleting the directory. Is
there any way to know when the GramJob.cancel() is done? Can I catch a
status change in the handleStatusChange method (I do implement
GramJobListener)? Which status indicates a successful cancellation? Or
can I somehow poll to know whether the cancellation is done?
Can anyone help me with this?
Thanks in advance,
Roelof