Indeed, that would be a good client feature, but should also be applied
ONLY when the failed request asked for work. I believe most such failed
requests producing lost tasks actually get an HTTP error 500 rather than
a timeout, perhaps that could also be considered.

With the delays involved in getting client changes adopted, which may be
even longer for 6.12.x than earlier series because so many users dislike
the Manager changes, I hope the simpler and quicker server-side change
using detection in sched_result.cpp will also be implemented. It should
usually be redundant information to the client change if both are
implemented, but reported results are handled before the work request so
there need be no difficulty there.
-- 
                                                              Joe


On 11 Aug 2010 at 10:35, [email protected] wrote:

> I noticed this line:
> 
> 8/9/10 22:06:47 s...@home   Scheduler request failed: Timeout was reached
> 
> If that were used to set a flag sent to the server on the next connection
> (and ONLY the next connection to that project) that a search for missing
> results should be performed, the server could then do the missing results
> search in a much more focused manner than the original method of doing it
> for every single connection.  This might reduce the DB costs enough to make
> it work on s...@h.
> 
> jm7
> 
> 
>                                                                            
>              "Josef W. Segur"                                              
>              <jse...@westelcom                                             
>              .com>                                                      To 
>              Sent by:                  David Anderson                      
>              <boinc_dev-bounce         <[email protected]>,           
>              [email protected]         [email protected]          
>              u>                                                         cc 
>                                                                            
>                                                                    Subject 
>              08/11/2010 08:47          [boinc_dev] Lost tasks              
>              AM                                                            
>                                                                            
>                                                                            
>                                                                            
>                                                                            
>                                                                            
> 
> 
> 
> 
> January 20, 2008 I started a "resend_lost_results improvement?" thread
> here, which eventually came down to what looked like some practical
> methods to avoid the problem. Unfortunately none have been implemented
> and the s...@h database may now have on the order of 1 million result
> records indicating "in progress" which are not actually so.
> 
> In addition to ideas discussed then, I've observed on one of my systems
> a sequence which the logic in sched_result.cpp could easily use to flag
> a need to check what is actually on the host against the database. Here
> are some messages from host 2818173
> <URL: http://setiathome.berkeley.edu/show_host_detail.php?hostid=2818173 >
> and trimmed selections from its web task list:
> 
> -----------------------------------------------------------
> 8/9/10 22:01:08 s...@home   work fetch resumed by user
> 8/9/10 22:01:39     Resuming network activity
> 8/9/10 22:01:39 s...@home   Sending scheduler request: To fetch work.
> 8/9/10 22:01:39 s...@home   Reporting 2 completed tasks, requesting new
> tasks
> 8/9/10 22:06:47 s...@home   Scheduler request failed: Timeout was reached
> 8/9/10 22:07:47 s...@home   Sending scheduler request: To fetch work.
> 8/9/10 22:07:47 s...@home   Reporting 2 completed tasks, requesting new
> tasks
> 8/9/10 22:09:25 s...@home   Scheduler request completed: got 3 new tasks
> 8/9/10 22:09:28 s...@home   Started download of
> ap_06my10af_B4_P0_00047_20100809_26735.wu
> 8/9/10 22:09:28 s...@home   Started download of
> ap_06my10af_B4_P0_00039_20100809_26735.wu
> 8/9/10 22:09:30 s...@home   Started download of
> ap_12ja10aa_B4_P1_00299_20100720_12222.wu
> 
> 
> 
> tasks for computer 2818173
> 
> Task        Work unit   Sent                     Status       Application
> 1680067243  635381158   10 Aug 2010 2:09:50 UTC  In progress  Astropulse
> v505
> 1680067092  642626371   10 Aug 2010 2:09:49 UTC  In progress  Astropulse
> v505
> 1680066966  642626352   10 Aug 2010 2:09:50 UTC  In progress  Astropulse
> v505
> 1680062556  642625860   10 Aug 2010 2:03:36 UTC  In progress  Astropulse
> v505
> 1680062552  642515167   10 Aug 2010 2:03:36 UTC  In progress  Astropulse
> v505
> 1680062479  642625851   10 Aug 2010 2:03:36 UTC  In progress  Astropulse
> v505
> 
> 
> Task        Work unit   Reported                 Status       Application
> 1677856015  641623392   10 Aug 2010 2:03:36 UTC  Completed    s...@home
> Enhanced
> 1677855960  641623387   10 Aug 2010 2:03:36 UTC  Completed    s...@home
> Enhanced
> -----------------------------------------------------------
> 
> So the first Scheduler request both successfully reported two completions
> and three new Astropulse tasks were "Sent" at 2:03:36 UTC but that reply
> didn't get to my system.
> 
> The point is that the next Scheduler request REreported tasks, and that's
> prima facie evidence that my system had not received a previous reply. Of
> course not all requests are accompanied by reported completions, nor do
> all replies "Send" work, but it seems a shame not to act on that evidence.
> 
> I of course also preferred the sched_result.cpp logic which used to send
> messages back to the client saying "Completed result %s refused: result
> already reported as success". Changeset 21671 removed that information
> and even logging it on the server now needs debug_handle_results set. It's
> very much like sweeping dirt under the rug.
> --
>                                                 Joe

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to