Re: Mersenne: Multiple residues - enhancing double-checking

Mikus Grinbergs Thu, 5 Aug 1999 14:27:10 -0700
I agree with everything that Lucas said.

For the project to benefit, the participants would have to AGREE
to restructure the work.  Savings would result from stopping the
diverging work earlier.  (And if "passing around" intermediate
files was accepted, by starting any triple-check work later.)

In essence, the idea is to take that 1_year_to_complete LL test of
an exponent and BREAK IT DOWN into (n) discrete serial work units.

mikus


Proposed procedural steps:

(b) When 'Tester 1' requests work, assign ONLY work unit (1) to him.
    Howewver, the project RESERVES exponent (EEEE) to 'Tester 1' -
    only *he* will (in the future) be assigned initial testing of
    further work units within exponent (EEEE).

(c) Eventually, Tester 1 finishes that work unit, and begins to
    work on some other work unit on some other exponent.  HOWEVER,
    'Tester 1' saves the intermediate file (of how far he got with
    exponent EEEE).

    Somewhere along the line, exponent (EEEE) work unit (1) becomes
    available to be assigned to a double-checker.

(d) Now, when 'Checker 2' requests work, assign exponent (EEEE) work
    unit (1) to him for double-checking.

    Since this is work unit (1), there is as yet no intermediate
    file for exponent (EEEE).  If, however, this was a subsequent
    work unit, and Checker 2 was *not* the person who performed
    the preceding work unit's test/check, then Checker 2 might
    OBTAIN the intermediate file from whoever did that preceding
    work unit.

(e) When Checker 2 finishes work unit (1) of exponent (EEEE), his
    (work unit "final") residue is compared to the residue reported
    for work unit (1) by Tester 1.  HOWEVER, 'Checker 2' saves the
    intermediate file (of how far he got with exponent EEEE).

(f) If the residues match, exponent (EEEE) work unit (2) becomes
    available to be assigned to 'Tester 1'.

    Now, 'Tester 1' and 'Checker X' repeat steps (b) - (e), until
    the residues of the FINAL work unit on exponent (EEEE) agree.

    ----
    If the residues of the most recently tested AND checked work
    unit of exponent (EEEE) do not match, then that work unit
    becomes available to be assigned to a triple-checker.

(g) Now, when 'Checker 3' requests work, assign the diverging work
    unit within exponent (EEEE) to him for re-checking.

    Checker 3 might OBTAIN from Tester 1 the intermediate file for
    exponent (EEEE) as of two work units back.  Checker 3 will
    perform the first of those two work units (i.e., the one
    previous to the divergence) to verify that his (work unit
    "final") residue matches the (known good) one from Tester 1.

(h) Checker 3 will then go on to triple-check the diverging work
    unit.  When Checker 3 finishes that work unit of exponent
    (EEEE), his (work unit "final") residue is compared to the
    residue reported by Tester 1.

(i) If the residues match, the next work unit of exponent (EEEE)
    becomes available to be assigned to 'Tester 1', and testing
    and checking continue the same as with a successful step (f).

    ----
    If Checker 3's (work unit "final") residue does not match
    Tester 1's, but does match Checker 2's, then exponent (EEEE)
    is un-reserved from 'Tester 1', and is instead RESERVED to
    'Checker 2'.  Testing and checking of the work units of (EEEE)
    continue with step (i), where 'Checker 2' replaces 'Tester 1'.

    ----
(k) If Checker 3's (work unit "final") residue on the diverging
    work unit of exponent (EEEE) matches neither Tester 1's nor
    Checker 2's, maybe the best thing to do is to start all over
    with a new set of participants.


Comment:  If the breaking down of a long LL computation into discrete
          units of work is adopted, thought might be given to having
          prime95 generate a "universal interchange" intermediate
          file at the end of a unit of work - in other words, special
          output that would be acceptable as input across a number of
          operating systems and/or hardware chip types.


Note:     For the sake of "fairness", becoming the 'reserved owner'
          on a 10-unit exponent might require 10 'credits' earned
          through doing checking.




In article <[EMAIL PROTECTED]>,
Lucas Wiman  <[EMAIL PROTECTED]> wrote:
> > This idea is rather obvious, and no, I don't remember seeing it either.
>
> This had been discussed earlier.  Brian and I talked about it for a little
> while, he came up with the original idea.
>
> > I think the idea has definite merit.  If an error does occur, it's equally
> > likely to happen at any step along the way, statistically.  Errors are every
> > bit as likely to happen on the very first iteration as they are during the
> > 50% mark, or the 32.6% mark, or on the very last iteration.
>
> True, but if the system is malfunctioning then the errors should start
> early.
>
> > Especially as the exponents get larger and larger, I see a *definite*
> > possibility to reduce double check times by having first time LL tests
> > report residues at certain "percentages" along the way.
>
> Yeah.  The error rate should be proportional to the runtime which is increases
> with the square of the exponent (ouch!).
>
> > Just for example, every 10% along the way, it'll send it's current residue
> > to the Primenet server.
>
> I'm guessing that you mean a certain amount of the residue.  Sending in
> 10 2meg files for *each* exponent in the 20,000,000 range would get very
> unwieldy, and inconvenient for people and primenet.
>
> Of course, this would only help if we were running more one test for the
> same exponent at the same time (otherwise, this would just be a pointless
> way to do a triple check).  They would either have to be coordinated
> (running at the same time, logistical knightmare), or (as Brian suggested)
> have a "pool" of exponents running on one computer.  That is to say when
> one computer finishes to X%, it reports its 64-bit residue to primenet, and
> waits for the second computer working on the same LL test to do the same.
> Until the other (slower) computer reports in, the (faster) computer works on
> another exponent.
>
> This would speed up the entire project, but it would slow down the individual
> exponent, which would make people mad :(.
>
> > I forget the numbers being tossed around,
> > but you'd only save 50% of (the error rate) of the
> > checking time.
>
> As I pointed out above, the error rate should increase with the square of the
> exponent (plus change).  This means that if 1% have errors at 7mil, 22% will
> have errors at 30mil.
>
> -Lucas Wiman
>

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Multiple residues - enhancing double-checking

Reply via email to