His host seems to be losing track of RPC sequence numbers.  Loss of cached
writes on restart?

2014-08-08 07:13:53.1883 [PID=28339]   [HOST#6960982] [USER#8522684] RPC
seqno 59642 less than expected 59643; creating new host
2014-08-08 07:13:53.1896 [PID=28339]   [HOST#6960982] [USER#8522684] Found
similar existing host for this user - assigned.
2014-08-08 07:13:53.1932 [PID=28339] [CRITICAL]  [HOST#6960982]
[RESULT#3670788988] [WU#1562416658] changed CPID: marking in-progress
result 03se08ad.16169.8252.438086664200.12.220_0 as client error!
2014-08-08 07:13:53.1932 [PID=28339]   Request: [USER#8522684]
[HOST#6960982] [IP 41.79.224.134] client 7.2.42



On Fri, Aug 8, 2014 at 9:17 AM, Richard Haselgrove <
[email protected]> wrote:

> The same user appears to have suffered another 'abandon' event today:
>
> http://setiathome.berkeley.edu/results.php?hostid=6960982&state=6
>
> The reasons mentioned by Eric are all valid, but there appears to be an
> irreducible core of sporadic events which cannot be ascribed to user
> malfeasance. In earlier reports like this, many (but not all) of the cases
> appeared to be associated with long-distance and/or poor quality internet
> connections - again, like this one.
>
>   ------------------------------
>  *From:* Eric J Korpela <[email protected]>
> *To:* "McLeod, John" <[email protected]>
> *Cc:* "[email protected]" <[email protected]>; Richard
> Haselgrove <[email protected]>
> *Sent:* Friday, 8 August 2014, 16:56
>
> *Subject:* Re: [boinc_dev] astropulse robustness / abandonned tasks
>
> Astropulse does checkpoint quite frequently, and restarts without problem
> most of the time.  "Abandoned" is definitely a server side decision that
> indicates a client detach or a reset or some sort of confusion as to the
> identity of a host and whether it was working on those results.  (Other
> possibilities include multiple hosts using a copied or shared BOINC
> directory, multiple copies of BOINC on one host using the same BOINC client
> directory, deletion or corruption or bad permissions on files in the BOINC
> client directory, any of which could confuse client or server).
>
>
> Which client version and OS are you using?
>
>
> On Fri, Aug 8, 2014 at 5:55 AM, McLeod, John <[email protected]> wrote:
>
> > BOINC has a checkpointing mechanism built in, but it requires that the
> > project developers write checkpoint code.  Some projects can checkpoint
> > almost any time, and others can checkpoint only every few minutes, and
> some
> > cannot checkpoint at all.  SETI can checkpoint frequently (and instigated
> > the mechanism to NOT do every possible checkpoint, but only once every X
> > minutes).  CPDN always checkpoints every time it can (typically this is
> > several minutes).  I cannot remember an example of one that cannot
> > checkpoint at all, but they exist.
> >
> > -----Original Message-----
> > From: boinc_dev [mailto:[email protected]] On Behalf Of
> > Richard Haselgrove
> > Sent: Friday, August 08, 2014 4:48 AM
> > To: Luc A. Germain; [email protected]
> > Subject: Re: [boinc_dev] astropulse robustness / abandonned tasks
> >
> > The abandoning of tasks happens when the BOINC server 'thinks' that it
> has
> > 'evidence' that the client has detached from the project and then
> > re-attached again. This has affected a number of users in the past, but
> has
> > proved extremely tricky to diagnose and resolve: not least, because most
> of
> > the evidence resides in the server logs.
> >
> > We did investigate one suspected case at Albert during credit testing,
> but
> > that turned out to be a genuine 'detach' caused by hard disk failure - it
> > is distinguished from reports like this one because no running tasks were
> > left on the host computer (they were on the drive that failed...) to
> waste
> > time and electricity.
> >
> > I would certainly welcome it if we could pair up a developer and a
> project
> > administrator with access to server logs to investigate this problem and
> > cure it at source.
> >
> > The checkpointing question is a matter for the project developers, and
> > I'll leave it to them to respond via this list.
> >
> >
> >
> > >________________________________
> > > From: Luc A. Germain <[email protected]>
> > >To: [email protected]
> > >Sent: Friday, 8 August 2014, 9:41
> > >Subject: [boinc_dev] astropulse robustness / abandonned tasks
> > >
> > >
> > >Hi,
> > >Two things:
> > >1) A suggestion here for you develloppers ;-) As atropulse tasks take
> > "some" time to complete they are more prone to power failure as we have
> in
> > the third world. When it happens most of the time the task restarts
> > computing from start (this is even more frustrating when the task reaches
> > near completion). Could it be possible to introduce regular checkpoints
> by
> > saving intermediate data, or work files, where the task computing could
> > restart from, saving so a lot of computing time ? Maybe this could be an
> > option in the user profile as I guess not everyone needs this.
> > >
> > >2) Two days ago I sent a message about abandonned tasks. Since, all my
> > computing goes to the garbage bin as they are not taken into account.
> Which
> > procedure should/could I try to solve this problem ? Could
> > uninstalling/reinstalling the application from my computers be a
> solution?
> > Should I wait till the problem solves by itself (and would this not take
> > ages) ?
> > >
> > >An answer would be highly appreciated.
> > >
> > >Best regards and thanks for your work,
> > >Luc
> > >_______________________________________________
> > >boinc_dev mailing list
> > >[email protected]
> > >http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> > >To unsubscribe, visit the above URL and
> > >(near bottom of page) enter your email address.
> > >
> > >
> > >
> > _______________________________________________
> > boinc_dev mailing list
> > [email protected]
> > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> > To unsubscribe, visit the above URL and
> > (near bottom of page) enter your email address.
> > _______________________________________________
> > boinc_dev mailing list
> > [email protected]
> > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> > To unsubscribe, visit the above URL and
> > (near bottom of page) enter your email address.
> >
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
>
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to