Hi!
> On 15 Oct 2015, at 4:38 , Ralph Castain <[email protected]> wrote:
>
> Okay, please try the attached patch. It will cause two messages to be output
> for each job: one indicating the job has been marked terminated, and the
> other reporting that the completion message was sent to the requestor. Let's
> see what that tells us.
In this run of 42, 6 did not return, therefore 36 completed successfully.
$ grep TERMINATED dvm_output-patched.txt |wc -l
72
$ grep NOTIFYING dvm_output-patched.txt |wc -l
36
$ grep "Releasing job data" dvm_output-patched.txt |wc -l
77
$ grep "sess_dir_finalize" dvm_output-patched.txt |wc -l
36
$ grep "Releasing job data for.*," dvm_output-patched.txt|sort -k4 -t"," -n|wc
-l
35
So interestingly this is 35, and not 36.
$ grep "Releasing job data for.*," dvm_output-patched.txt|sort -k4 -t"," -n|head
[netbook:06716] [[9528,0],0] Releasing job data for [9528,2]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,8]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,9]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,10]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,12]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,13]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,14]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,15]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,16]
[netbook:06716] [[9528,0],0] Releasing job data for [9528,17]
Which means task 1,3,4,5,6,7,11 didn't return. Which shows a clear bias towards
the "early" tasks.
Hopefully this provides you more insight.
Thanks!
Mark