January 20, 2008 I started a "resend_lost_results improvement?" thread here, which eventually came down to what looked like some practical methods to avoid the problem. Unfortunately none have been implemented and the s...@h database may now have on the order of 1 million result records indicating "in progress" which are not actually so.
In addition to ideas discussed then, I've observed on one of my systems a sequence which the logic in sched_result.cpp could easily use to flag a need to check what is actually on the host against the database. Here are some messages from host 2818173 <URL: http://setiathome.berkeley.edu/show_host_detail.php?hostid=2818173 > and trimmed selections from its web task list: ----------------------------------------------------------- 8/9/10 22:01:08 s...@home work fetch resumed by user 8/9/10 22:01:39 Resuming network activity 8/9/10 22:01:39 s...@home Sending scheduler request: To fetch work. 8/9/10 22:01:39 s...@home Reporting 2 completed tasks, requesting new tasks 8/9/10 22:06:47 s...@home Scheduler request failed: Timeout was reached 8/9/10 22:07:47 s...@home Sending scheduler request: To fetch work. 8/9/10 22:07:47 s...@home Reporting 2 completed tasks, requesting new tasks 8/9/10 22:09:25 s...@home Scheduler request completed: got 3 new tasks 8/9/10 22:09:28 s...@home Started download of ap_06my10af_B4_P0_00047_20100809_26735.wu 8/9/10 22:09:28 s...@home Started download of ap_06my10af_B4_P0_00039_20100809_26735.wu 8/9/10 22:09:30 s...@home Started download of ap_12ja10aa_B4_P1_00299_20100720_12222.wu tasks for computer 2818173 Task Work unit Sent Status Application 1680067243 635381158 10 Aug 2010 2:09:50 UTC In progress Astropulse v505 1680067092 642626371 10 Aug 2010 2:09:49 UTC In progress Astropulse v505 1680066966 642626352 10 Aug 2010 2:09:50 UTC In progress Astropulse v505 1680062556 642625860 10 Aug 2010 2:03:36 UTC In progress Astropulse v505 1680062552 642515167 10 Aug 2010 2:03:36 UTC In progress Astropulse v505 1680062479 642625851 10 Aug 2010 2:03:36 UTC In progress Astropulse v505 Task Work unit Reported Status Application 1677856015 641623392 10 Aug 2010 2:03:36 UTC Completed s...@home Enhanced 1677855960 641623387 10 Aug 2010 2:03:36 UTC Completed s...@home Enhanced ----------------------------------------------------------- So the first Scheduler request both successfully reported two completions and three new Astropulse tasks were "Sent" at 2:03:36 UTC but that reply didn't get to my system. The point is that the next Scheduler request REreported tasks, and that's prima facie evidence that my system had not received a previous reply. Of course not all requests are accompanied by reported completions, nor do all replies "Send" work, but it seems a shame not to act on that evidence. I of course also preferred the sched_result.cpp logic which used to send messages back to the client saying "Completed result %s refused: result already reported as success". Changeset 21671 removed that information and even logging it on the server now needs debug_handle_results set. It's very much like sweeping dirt under the rug. -- Joe _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
