*I didn't have any of the reliable* config settings for accelerating retries set, but in case it was doing it automatically or becomes of some other settings, I added *reliable_reduced_delay_bound with a value of 1.0 so if that was the case, it should now be ignored.
As far as abandoned WUs go, a large number of the Collatz users have app_config files and/or app_info files and if there is an error, the host abandons all the WUs at startup. I don't think the server ever gets notified when that happens. The WUs just disappear and the files are deleted when BOINC starts. Maybe it would be better if it kept them around and aborted them or, better yet, put them on hold somehow as the user generally figures out the error and fixes it. Jon Sonntag On Sat, May 3, 2014 at 3:47 AM, Richard Haselgrove < [email protected]> wrote: > The workunit in question, > http://boinc.thesonntags.com/collatz/workunit.php?wuid=5741706, has two > results: the first had an initial deadline of 8 May (normal), and it's only > the retry which has the reduced 2-day deadline. I imagine David's reference > to http://boinc.berkeley.edu/trac/wiki/ProjectOptions#Acceleratingretriesis > the likely explanation. > > But it raises a secondary question. Why does the first user's computer > have 200 tasks marked as 'Abandoned'? > > http://boinc.thesonntags.com/collatz/results.php?hostid=115599&state=6 > > > This seems to be a relatively recent phenomenon (timescale 1 year / 18 > months). It can happen by deliberate user action (detach from project), but > in every case I've investigated following a user report on a message board, > the user has been adamant that they didn't detach or take any action which > should plausibly lead to tasks being abandoned: indeed, they report that > there is no sign on their computer that anything is wrong, and the tasks > are still shown in BOINC Manager and are being computed and reported as > normal. It's only when/if they visit their project account page - perhaps > because they notice a reduced rate of credit being awarded - that they find > their time and electricity is being used to no purpose. > > mark_results_over() in handle_request.cpp is supposed to be called when > there is 'evidence' that the host has been detached/reattached, the > statefile has been copied from another machine (corrupting rpc_seqno), or > some major event like that. But it seems to happen in other cases too - > most recently http://www.gpugrid.net/forum_thread.php?id=3740#36629. > > There seems to be some correlation in client logs with failed RPC > attempts, perhaps reinforcing the rpc_seqno theory, but it really needs > somebody with access to a server log to look into this. It greatly annoys > the volunteers when they find a substantial volume of work (as in this > case) has been thrown down the drain. > > > > >________________________________ > > From: David Anderson <[email protected]> > >To: [email protected] > >Sent: Saturday, 3 May 2014, 6:01 > >Subject: Re: [boinc_dev] Deadlines > > > > > >The scheduler has an optional mechanism that reduces the latency bound > >of results that > >1) are retries > >2) are being sent to a "reliable" host > >See: > >http://boinc.berkeley.edu/trac/wiki/ProjectOptions#Acceleratingretries > > > >Check your config.xml to see if you're using this. > > > >-- David > > > >On 02-May-2014 9:41 PM, Jon Sonntag wrote: > >> Anyone have any idea why a result would have a deadline of 2 days when > the > >> work generator has the delay bound set to 7 * 86400 or 7 days? > >> > >> Created2 May 2014, 19:57:23 UTCSent2 May 2014, 21:07:30 UTCReport > deadline4 > >> May 2014, 3:36:58 UTCReceived3 May 2014, 2:30:23 UTCLast time modified2 > May > >> 2014, 21:30:26 UTC > >> The workunit has the following info according to the database: > >> > >> mysql> select FROM_UNIXTIME(create_time),FROM_UNIXTIME(delay_bound) from > >> workunit where id=5741706; > >> +----------------------------+----------------------------+ > >> | FROM_UNIXTIME(create_time) | FROM_UNIXTIME(delay_bound) | > >> +----------------------------+----------------------------+ > >> | 2014-04-30 18:40:23 | 1970-01-14 18:00:00 | > >> +----------------------------+----------------------------+ > >> > >> Even if the deadline was related to the time the workunit record was > >> created, it would still be May 6 and not May 4. Anyone have a theory? > It > >> seems to be happening randomly For large WUs, it causes the client to > >> panic and go into high priority mode and the end users are not happy > when > >> that happens. > >> . > >> Jon Sonntag > >> _______________________________________________ > >> boinc_dev mailing list > >> [email protected] > >> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > >> To unsubscribe, visit the above URL and > >> (near bottom of page) enter your email address. > >> > >_______________________________________________ > >boinc_dev mailing list > >[email protected] > >http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > >To unsubscribe, visit the above URL and > >(near bottom of page) enter your email address. > > > > > > > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
