Hi all,

I've the situation in which I submit a GramJob to a SGE jobmanager and run it in a specific directory (that I create just before submitting the job). Sometimes I want to cancel the job and therefore I call the method GramJob.cancel(). Directly after this call I remove the directory the job did run in. This leads sometimes to the situation that indeed the directory is deleted, but the job keeps running. The logging of the sge jobmanager tells me this:

03/06/2008 17:23:28|qmaster|fs0|W|job 176632.1 failed on host xxx general opening input/output file because: 03/06/2008 17:23:28 [1001:23735]: error: can't open output file "xxx/17249.1204820556/stdout": Stale NFS file handle

The stale NFS file handle is probably the reason that the job isn't properly cancelled, and I understand that the cancellation takes some time and that I have to wait for it before deleting the directory. Is there any way to know when the GramJob.cancel() is done? Can I catch a status change in the handleStatusChange method (I do implement GramJobListener)? Which status indicates a successful cancellation? Or can I somehow poll to know whether the cancellation is done?

Can anyone help me with this?

Thanks in advance,

Roelof

Reply via email to