January 20, 2008 I started a "resend_lost_results improvement?" thread
here, which eventually came down to what looked like some practical
methods to avoid the problem. Unfortunately none have been implemented
and the s...@h database may now have on the order of 1 million result
records indicating "in progress" which are not actually so.

In addition to ideas discussed then, I've observed on one of my systems
a sequence which the logic in sched_result.cpp could easily use to flag
a need to check what is actually on the host against the database. Here
are some messages from host 2818173
<URL: http://setiathome.berkeley.edu/show_host_detail.php?hostid=2818173 >
and trimmed selections from its web task list:

-----------------------------------------------------------
8/9/10 22:01:08 s...@home   work fetch resumed by user
8/9/10 22:01:39     Resuming network activity
8/9/10 22:01:39 s...@home   Sending scheduler request: To fetch work.
8/9/10 22:01:39 s...@home   Reporting 2 completed tasks, requesting new tasks
8/9/10 22:06:47 s...@home   Scheduler request failed: Timeout was reached
8/9/10 22:07:47 s...@home   Sending scheduler request: To fetch work.
8/9/10 22:07:47 s...@home   Reporting 2 completed tasks, requesting new tasks
8/9/10 22:09:25 s...@home   Scheduler request completed: got 3 new tasks
8/9/10 22:09:28 s...@home   Started download of 
ap_06my10af_B4_P0_00047_20100809_26735.wu
8/9/10 22:09:28 s...@home   Started download of 
ap_06my10af_B4_P0_00039_20100809_26735.wu
8/9/10 22:09:30 s...@home   Started download of 
ap_12ja10aa_B4_P1_00299_20100720_12222.wu



tasks for computer 2818173  

Task        Work unit   Sent                     Status       Application
1680067243  635381158   10 Aug 2010 2:09:50 UTC  In progress  Astropulse v505
1680067092  642626371   10 Aug 2010 2:09:49 UTC  In progress  Astropulse v505
1680066966  642626352   10 Aug 2010 2:09:50 UTC  In progress  Astropulse v505
1680062556  642625860   10 Aug 2010 2:03:36 UTC  In progress  Astropulse v505
1680062552  642515167   10 Aug 2010 2:03:36 UTC  In progress  Astropulse v505
1680062479  642625851   10 Aug 2010 2:03:36 UTC  In progress  Astropulse v505


Task        Work unit   Reported                 Status       Application
1677856015  641623392   10 Aug 2010 2:03:36 UTC  Completed    s...@home Enhanced
1677855960  641623387   10 Aug 2010 2:03:36 UTC  Completed    s...@home Enhanced
-----------------------------------------------------------

So the first Scheduler request both successfully reported two completions
and three new Astropulse tasks were "Sent" at 2:03:36 UTC but that reply
didn't get to my system. 

The point is that the next Scheduler request REreported tasks, and that's
prima facie evidence that my system had not received a previous reply. Of
course not all requests are accompanied by reported completions, nor do
all replies "Send" work, but it seems a shame not to act on that evidence.

I of course also preferred the sched_result.cpp logic which used to send
messages back to the client saying "Completed result %s refused: result
already reported as success". Changeset 21671 removed that information
and even logging it on the server now needs debug_handle_results set. It's
very much like sweeping dirt under the rug.
-- 
                                                Joe

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to