I noticed this line:
8/9/10 22:06:47 s...@home Scheduler request failed: Timeout was reached
If that were used to set a flag sent to the server on the next connection
(and ONLY the next connection to that project) that a search for missing
results should be performed, the server could then do the missing results
search in a much more focused manner than the original method of doing it
for every single connection. This might reduce the DB costs enough to make
it work on s...@h.
jm7
"Josef W. Segur"
<jse...@westelcom
.com> To
Sent by: David Anderson
<boinc_dev-bounce <[email protected]>,
[email protected] [email protected]
u> cc
Subject
08/11/2010 08:47 [boinc_dev] Lost tasks
AM
January 20, 2008 I started a "resend_lost_results improvement?" thread
here, which eventually came down to what looked like some practical
methods to avoid the problem. Unfortunately none have been implemented
and the s...@h database may now have on the order of 1 million result
records indicating "in progress" which are not actually so.
In addition to ideas discussed then, I've observed on one of my systems
a sequence which the logic in sched_result.cpp could easily use to flag
a need to check what is actually on the host against the database. Here
are some messages from host 2818173
<URL: http://setiathome.berkeley.edu/show_host_detail.php?hostid=2818173 >
and trimmed selections from its web task list:
-----------------------------------------------------------
8/9/10 22:01:08 s...@home work fetch resumed by user
8/9/10 22:01:39 Resuming network activity
8/9/10 22:01:39 s...@home Sending scheduler request: To fetch work.
8/9/10 22:01:39 s...@home Reporting 2 completed tasks, requesting new
tasks
8/9/10 22:06:47 s...@home Scheduler request failed: Timeout was reached
8/9/10 22:07:47 s...@home Sending scheduler request: To fetch work.
8/9/10 22:07:47 s...@home Reporting 2 completed tasks, requesting new
tasks
8/9/10 22:09:25 s...@home Scheduler request completed: got 3 new tasks
8/9/10 22:09:28 s...@home Started download of
ap_06my10af_B4_P0_00047_20100809_26735.wu
8/9/10 22:09:28 s...@home Started download of
ap_06my10af_B4_P0_00039_20100809_26735.wu
8/9/10 22:09:30 s...@home Started download of
ap_12ja10aa_B4_P1_00299_20100720_12222.wu
tasks for computer 2818173
Task Work unit Sent Status Application
1680067243 635381158 10 Aug 2010 2:09:50 UTC In progress Astropulse
v505
1680067092 642626371 10 Aug 2010 2:09:49 UTC In progress Astropulse
v505
1680066966 642626352 10 Aug 2010 2:09:50 UTC In progress Astropulse
v505
1680062556 642625860 10 Aug 2010 2:03:36 UTC In progress Astropulse
v505
1680062552 642515167 10 Aug 2010 2:03:36 UTC In progress Astropulse
v505
1680062479 642625851 10 Aug 2010 2:03:36 UTC In progress Astropulse
v505
Task Work unit Reported Status Application
1677856015 641623392 10 Aug 2010 2:03:36 UTC Completed s...@home
Enhanced
1677855960 641623387 10 Aug 2010 2:03:36 UTC Completed s...@home
Enhanced
-----------------------------------------------------------
So the first Scheduler request both successfully reported two completions
and three new Astropulse tasks were "Sent" at 2:03:36 UTC but that reply
didn't get to my system.
The point is that the next Scheduler request REreported tasks, and that's
prima facie evidence that my system had not received a previous reply. Of
course not all requests are accompanied by reported completions, nor do
all replies "Send" work, but it seems a shame not to act on that evidence.
I of course also preferred the sched_result.cpp logic which used to send
messages back to the client saying "Completed result %s refused: result
already reported as success". Changeset 21671 removed that information
and even logging it on the server now needs debug_handle_results set. It's
very much like sweeping dirt under the rug.
--
Joe
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.