Astropulse does checkpoint quite frequently, and restarts without problem most of the time. "Abandoned" is definitely a server side decision that indicates a client detach or a reset or some sort of confusion as to the identity of a host and whether it was working on those results. (Other possibilities include multiple hosts using a copied or shared BOINC directory, multiple copies of BOINC on one host using the same BOINC client directory, deletion or corruption or bad permissions on files in the BOINC client directory, any of which could confuse client or server).
Which client version and OS are you using? On Fri, Aug 8, 2014 at 5:55 AM, McLeod, John <[email protected]> wrote: > BOINC has a checkpointing mechanism built in, but it requires that the > project developers write checkpoint code. Some projects can checkpoint > almost any time, and others can checkpoint only every few minutes, and some > cannot checkpoint at all. SETI can checkpoint frequently (and instigated > the mechanism to NOT do every possible checkpoint, but only once every X > minutes). CPDN always checkpoints every time it can (typically this is > several minutes). I cannot remember an example of one that cannot > checkpoint at all, but they exist. > > -----Original Message----- > From: boinc_dev [mailto:[email protected]] On Behalf Of > Richard Haselgrove > Sent: Friday, August 08, 2014 4:48 AM > To: Luc A. Germain; [email protected] > Subject: Re: [boinc_dev] astropulse robustness / abandonned tasks > > The abandoning of tasks happens when the BOINC server 'thinks' that it has > 'evidence' that the client has detached from the project and then > re-attached again. This has affected a number of users in the past, but has > proved extremely tricky to diagnose and resolve: not least, because most of > the evidence resides in the server logs. > > We did investigate one suspected case at Albert during credit testing, but > that turned out to be a genuine 'detach' caused by hard disk failure - it > is distinguished from reports like this one because no running tasks were > left on the host computer (they were on the drive that failed...) to waste > time and electricity. > > I would certainly welcome it if we could pair up a developer and a project > administrator with access to server logs to investigate this problem and > cure it at source. > > The checkpointing question is a matter for the project developers, and > I'll leave it to them to respond via this list. > > > > >________________________________ > > From: Luc A. Germain <[email protected]> > >To: [email protected] > >Sent: Friday, 8 August 2014, 9:41 > >Subject: [boinc_dev] astropulse robustness / abandonned tasks > > > > > >Hi, > >Two things: > >1) A suggestion here for you develloppers ;-) As atropulse tasks take > "some" time to complete they are more prone to power failure as we have in > the third world. When it happens most of the time the task restarts > computing from start (this is even more frustrating when the task reaches > near completion). Could it be possible to introduce regular checkpoints by > saving intermediate data, or work files, where the task computing could > restart from, saving so a lot of computing time ? Maybe this could be an > option in the user profile as I guess not everyone needs this. > > > >2) Two days ago I sent a message about abandonned tasks. Since, all my > computing goes to the garbage bin as they are not taken into account. Which > procedure should/could I try to solve this problem ? Could > uninstalling/reinstalling the application from my computers be a solution? > Should I wait till the problem solves by itself (and would this not take > ages) ? > > > >An answer would be highly appreciated. > > > >Best regards and thanks for your work, > >Luc > >_______________________________________________ > >boinc_dev mailing list > >[email protected] > >http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > >To unsubscribe, visit the above URL and > >(near bottom of page) enter your email address. > > > > > > > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
