----- Original Message -----
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
To: "Daran" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, December 04, 2002 2:55 PM
Subject: Re: Mersenne: P-1 and non k-smooth factors

> Let's see if I get this right.
>
> Overwhelmingly, the factors produced by P-1 factoring come out because
> they are smooth to the limits selected. The fraction that comes out
> because of the extension is << 10%. To double that fraction (i.e. to
> increase the total number of factors found by < 10%) we have to double
> the stage 2 run time?

That's not what I meant.

> Doesn't sound that great to me, even without worrying about memory
> considerations.
>
> If we're talking about the _extra_ computation time in stage 2 then
> obviously the suggestion makes a lot more sense -

Yes.  If it takes Time t to run a stage 2 with E=2, and t+d to run it with
E=4, then we should be willing to spend t+2d to run it with E=6, and t+5d to
run it with E=12, which, as far as I can see, is approximately how the cost
increases anyway.

I've only had time to run a few tests, but on an 8.1M exponent, B1=45000,
B2=731250 and varying values for D and E, I get the following results

D          E          Stage 2 transforms
420         2           93946
420         4           98618        (the default for my memory settings)
420        12         121520
210         4          104272
150         4          111962
120         4          116464

I did one with D=420, E=6, but I don't seem to have made a note of it.  Nor
did I record running times, because I was using the computer to do other
things.  Transforms ought to be reasonably proportionate.

The first three tests are consistent with the cost increasing linearly with
E, while the latter show the penalty of lowering the memory.  (Note that an
unmodified client would drop to E=2 below D=210, but I wanted to see just
the effect of lowering D)

With D=420, E=12 is over 23% dearer than the E=4 which is currently used.
However, this does *not* mean that it needs to find more 23% more factors to
be worthwhile.  If guess_pminus1_bounds is altered to be aware of D and E,
then it compensates for the extra cost by reducing B2, trading 'normal'
factors for extended ones.  My (limited) tests suggest, as I said, that 2%
more factors with E=6 or 10% more factors with E=12 would be a worthwhile
trade.  (This assumes that my formula for estimating the number of
squarings is accurate, something I have not checked.)

> ...though I think there needs to be a
> careful analysis as to what the extra computation time for actual E values
> might be...

I agree.  My tests have been limited to exponents in the 8.1M range, for no
particular reason than those are the ones I am doing.

> (as opposed to a rather simplistic linear model, which fails to take
> into account that some of the "temporaries" needed for small E probably
> drop out pretty well "for free").

I don't understand.

> If we _don't_ have to worry about memory, at some point it becomes
> cost-effective to run ECM with small limits instead of P-1 with much
> larger limits. And ECM can "easily" dig out some factors which are more
> or less inaccessible with P-1.

I was under the impression the ECM was only practical for small exponents
well below the current DC range.

> [... snip ... I don't disagree but the basic argument is the same as
> above]
> >
> > In 2 out of the 29 stage 2 factors I have found so far using E=4, k has
> > not been smooth to B2.  This suggests that increasing E from 4 to 12
> > could yield about 20% more factors.  I've done a few tests with a
> > modified and recompiled client, which suggests that it would worth it
> > even if E=12 yielded as few as 10% more factors, though I need to
> > investigate this further.
>
> That's a very small sample.

It's the only sample I have.  I'm trying to increase it by doing some P-1s
on exponents in the 1.2M range which have only been tested to B1=B2=1000.

> Some time ago I found a considerable number of first factors for exponents
> in the range 100,000-150,000 using P-1 with limits up to B1=10^6,
> B2=25x10^6.  The results.txt file doesn't record the E value used; though
> I did have tons of memory available (in relation to the exponent size) and
> seem to remember something about wondering what E=12 meant in the
> console output.  Or maybe I'm confusing this with recollections about
> running ECM?

That's possible as E is also used in the ECM code.  But E=12 output from a
P-1 means what it says.  If the value of E is not output, then it means E=2
(or perhaps E=1, but I've never tried using that little memory.)  E (>2) is
recorded in the results.txt file by recent clients, so either the version of
the client you were using then didn't record it, or your memory setting
didn't allow E>2.

> My records show 67 factors found; I mailed George on one occasion because
> P-1 found a factor which surprised me, but I don't think it happened
> twice.

How many of these were found during stage 2?  (If half your factors were
found during P-1 stage 2, and half of those used E=4 or greater, then your
single 'surprising' factor would not be out of line with my two.)

I have a total of 57 factors, including one found earlier today.  A few were
by TFs, 30 in P-1 stage 2 (including today's) and the rest in stage 1.

> Incidentally I found a factor only yesterday using P-1 on a production
> system:
>
> [Tue Dec  3 07:54:38 2002]
> P-1 found a factor in stage #2, B1=220000, B2=5665000.
> UID: beejaybee/Procyon, M17359099 has a factor: 312980494172935109497751
>
> Again no mention of E. If it helps, this system was set up to use 384
> MBytes memory...

Each temporary takes FFT size * 8 bytes, so you have 8*896 = 7168KB per
temporary or 54 temps (less a *lot* of overhead).

You need 56 temps to get to D=210, E=4.  Note also that there is a huge
margin of error compiled in, so that the client never comes near using the
amount allowed.  You'd probably have to increase it to over 400MB or more
before it would get up to 56.

The next level down is 54 temps for D=180, E=2, which rarely happens.  (That
it can happen at all is a bug, because D=210, E=2 would use the same number
of temps to find the same factors, and be faster, but the client never
chooses this plan.)  Below that is D=150, E=2 at 46 temps, then D=120, E=2
with 38 temps.  My guess is that your factor was found using one of these.

> In any case this should have come out without extensions; B1=65267
> B2=3077953 is sufficient to find the factor with the "standard" stage 2
> algorithm.

E=2 implements prime pairing, but yields no extended factors.

> Would there be any means of retrieving actual factors found using P-1 and
> the E values used from the server logs? The problem otherwise is that, so
> far as the database is concerned, once a factor is found, nobody cares
> much how!

If the server log is the same as the client's prime.log, then the
information isn't there.

What I was thinking of doing was writing a program to read in the factor
database from, say, 1M up (to avoid any factors found by ECM), discard any
below the TF limit, then try to find the smallest B2 which would yield the
factor given E=4, 6, 8, 10, or 12.  This won't tell us the absolute
frequency of extended factors, but the relative frequencies would test, and
perhaps quantify, my conjecture about the relative effectiveness of these
values.

> Regards
> Brian Beesley

Daran


_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to