Re: [boinc_dev] Quorum 1, multiple results spawned

Slawomir Rzeznicki Wed, 13 Mar 2013 07:49:56 -0700

So what's the correct way to mark the result as invalid when quorum = 1 ?
The only other way I found is to mark it with validate error.

I use the 'validate error' option to mark results which had any kind ofcorruption in the outputfile (missing file, damaged file, damaged structure, unexpected data etc).This tells other daemonsthat the result file is known to be bad so they just ignore the result.Note: results with 'validate error'

did not spawn more than 1 new result.

For me 'invalid' means that the output file is readable, data structure isfine and can be parsed by validator,however something is wrong with the output. Here at Enigma@Home it usuallyhappens when the clientruns a broken app or the host has serious hardware problems (faulty RAMfrequently results in pretty much

random data written to the output).

Here is an example of outcome = 5, validate_state = 2 when running 'fixed'validator.Without increasing the target_nresults only 1 more result was spawned andthen correctly validated on return.

This is what I'd expect.
http://www.enigmaathome.net/static/3202xb55id/invalidgood.png

And here is the same thing, outcome = 5, validate_state = 2 when runningstock validator.

http://www.enigmaathome.net/static/3201xdqtwd/invalidbad.png
This one will end up with 3 possible cases:
http://www.enigmaathome.net/static/3203xd8zgu/invalidend1.png
http://www.enigmaathome.net/static/3204xguazq/invalidend2.png

The third one is first result validated, second marked as 'didn't need'because the server does not wait forever

when the quorum is met (I think it waits for 6 hours then it's too late).

Btw, I had exactly the same problem at radioactive@home.

Radioactive@home only collects data so there is no need for any resends ifthe WU fails, basically the

workunit is just a name with a number assigned, no input files of any kind.

At first I marked bad results with validate error, the server then generateda resend (which is unnecessary, but ok).I had to fall back to 'Invalid' after I received a couple of questions,because people were actually thinking it's the

validator that's failing with 'validate error'.

So eventually I ended up exactly with the same problem, multiple resultsspawned for a single workunit andthis was very bad. The workaround I used there was setting 'max errorresults' to 1, so eventually after first error

the entire workunit is marked as bad.

Regards,
Slawomir Rzeznicki
http://www.enigmaathome.net

----- Original Message -----From: "McLeod, John" <[email protected]>To: "Slawomir Rzeznicki" <[email protected]>; "BOINC_dev"<[email protected]>

Sent: Wednesday, March 13, 2013 2:55 PM
Subject: RE: [boinc_dev] Quorum 1, multiple results spawned

The problem is that you are marking the result as invalid. This is a signalto the system to try again. Invalid does not mean a negative result, itmeans the result failed to compute correctly.


-----Original Message-----

From: boinc_dev [mailto:[email protected]] On Behalf OfSlawomir Rzeznicki

Sent: Tuesday, March 12, 2013 7:56 AM
To: BOINC_dev
Subject: [boinc_dev] Quorum 1, multiple results spawned

Hello,

Recently I've ran into a problem at Enigma@Home.
The project uses quorum = 1, the other result settings are:

minimum quorum 1
initial replication 1
max # of error/total/success tasks 3, 10, 6

Everything is fine until the server receives a success result which is then
marked as invalid by the validator.
The validator leaves the outcome at RESULT_OUTCOME_SUCCESS and sets
validate_state to
VALIDATE_STATE_INVALID.
The result is then marked as 'Completed, marked as invalid' just like I
want.

The problem is that the validator bumps the target_nresults by 1 which
results in spawning two results instead of one.

Now theoretically it's not a problem, here at Enigma@Home each result runs
on it's own and it doesn't really
matter if I use 1 workunit -> 1 result or 1 workunit -> many results (I use
1:1 mainly because it makes validation easier).
However due to random host turnaround times, some of the 'bonus work' will
be cancelled by server (if one of the hosts
is way faster and the other one contacts the server) or, what's worse, it
will be wasted if the other host does not contact
the server for a long period of time.

The source of the problem is in validator.cpp line 586:

// if #success results >= target_nresults,
// we need more results, so bump target_nresults
// NOTE: nsuccess_results should never be > target_nresults,
// but accommodate that if it should happen
//
if (nsuccess_results >= wu.target_nresults) {
   wu.target_nresults = nsuccess_results+1;
   transition_time = IMMEDIATE;
}

Just for test I commented these lines and the problem is gone.
Does it count as a bug, at least when the project uses quorum = 1 ?

Regards,
Slawomir Rzeznicki
http://www.enigmaathome.net

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and

(near bottom of page) enter your email address.

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] Quorum 1, multiple results spawned

Reply via email to