Re: [boinc_dev] astropulse robustness / abandonned tasks

McLeod, John Fri, 08 Aug 2014 05:56:34 -0700

BOINC has a checkpointing mechanism built in, but it requires that the project 
developers write checkpoint code.  Some projects can checkpoint almost any 
time, and others can checkpoint only every few minutes, and some cannot 
checkpoint at all.  SETI can checkpoint frequently (and instigated the 
mechanism to NOT do every possible checkpoint, but only once every X minutes).  
CPDN always checkpoints every time it can (typically this is several minutes).  
I cannot remember an example of one that cannot checkpoint at all, but they 
exist.


-----Original Message-----
From: boinc_dev [mailto:[email protected]] On Behalf Of 
Richard Haselgrove
Sent: Friday, August 08, 2014 4:48 AM
To: Luc A. Germain; [email protected]
Subject: Re: [boinc_dev] astropulse robustness / abandonned tasks

The abandoning of tasks happens when the BOINC server 'thinks' that it has 
'evidence' that the client has detached from the project and then re-attached 
again. This has affected a number of users in the past, but has proved 
extremely tricky to diagnose and resolve: not least, because most of the 
evidence resides in the server logs.

We did investigate one suspected case at Albert during credit testing, but that 
turned out to be a genuine 'detach' caused by hard disk failure - it is 
distinguished from reports like this one because no running tasks were left on 
the host computer (they were on the drive that failed...) to waste time and 
electricity.

I would certainly welcome it if we could pair up a developer and a project 
administrator with access to server logs to investigate this problem and cure 
it at source.

The checkpointing question is a matter for the project developers, and I'll 
leave it to them to respond via this list.



>________________________________
> From: Luc A. Germain <[email protected]>
>To: [email protected] 
>Sent: Friday, 8 August 2014, 9:41
>Subject: [boinc_dev] astropulse robustness / abandonned tasks
> 
>
>Hi,
>Two things:
>1) A suggestion here for you develloppers ;-) As atropulse tasks take "some" 
>time to complete they are more prone to power failure as we have in the third 
>world. When it happens most of the time the task restarts computing from start 
>(this is even more frustrating when the task reaches near completion). Could 
>it be possible to introduce regular checkpoints by saving intermediate data, 
>or work files, where the task computing could restart from, saving so a lot of 
>computing time ? Maybe this could be an option in the user profile as I guess 
>not everyone needs this.
>
>2) Two days ago I sent a message about abandonned tasks. Since, all my 
>computing goes to the garbage bin as they are not taken into account. Which 
>procedure should/could I try to solve this problem ? Could 
>uninstalling/reinstalling the application from my computers be a solution? 
>Should I wait till the problem solves by itself (and would this not take ages) 
>?
>
>An answer would be highly appreciated.
>
>Best regards and thanks for your work,
>Luc
>_______________________________________________
>boinc_dev mailing list
>[email protected]
>http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>To unsubscribe, visit the above URL and
>(near bottom of page) enter your email address.
>
>
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] astropulse robustness / abandonned tasks

Reply via email to