John was suggesting that client errors and aborts also be considered, so
that's a significant difference from the error_rate updated now only when
validation judges a result unworthy. On balance, I agree that a host
which reliably produces client errors of any type should not be considered
reliable for work issue.

However calculated, error_rate does not affect work which a host has
previously downloaded, only what is issued *after* the error_rate has grown
large enough to require validation. Raistmer's concern is largely based on
that scenario where a host doing GPU work has a large number of tasks on hand,
and when the GPU gets in a bad state it may return all of those as "success"
results but actually bad. If there were a method to force another replication
for all "in progress" results which were sent unreplicated to the host, that
could minimize the damage.

Consider a possible scenario where the children are allowed to use the host
for gaming when they have finished their homework, and the games leave the
GPU in a bad state. Such a host could transition from reliable to unreliable
daily, and hundreds of corrupted results could be assimilated each time. If
the host were turned off at bedtime, it would be in reliable condition when
turned on the next day.

The daily quota is no protection for scenarios like that if the host is also
doing CPU work for the same project. All it takes is one good CPU result for
each 49 bad GPU results to keep a daily quota of 100 at max.
-- 
                                                            Joe

On 9 Nov 2009 at 8:56, David wrote:

> That's pretty much what it does.  Have you looked at the code?
> -- David
> 
> [email protected] wrote:
> > Adaptive replication should track a machines validation and error history.
> > Machines that have high error rates (and the machine you are describing has
> > a high error rate) will have a very low chance of running without
> > validation.  On the other hand machines that never have validation errors
> > will have a very high chance of running solo.
> > 
> > The way I would do it is to store a success fraction per computer (1 -
> > (errors + aborts + invalid)/total tasks).  The calculation of whether to
> > actually issue another task after this one would be:  (R - (N + 1))*F*C
> > where R is the replication level requested by the project (one based), and
> > N is the replication number of this replication (0 based), and F is the
> > Success Fraction for this project on this computer, and C is some constant
> > to prevent computers have regular errors from ever running solo.  Since (R
> > - (N + 1)) is 0 for the last requested replicant, no others will be issued
> > unless there is an error or late task.  If C is 10, then only tasks that
> > have better than 90% success rate will EVER run solo in a 2 replicant
> > system.  C could be a project setting, but it should never be allowed to be
> > set to less than 1.  Arguably, 10 is about right.
> >
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.


_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to