Re: Mersenne: P-1 and non k-smooth factors

Alexander Kruppa Wed, 11 Dec 2002 04:35:22 -0800

Daran wrote:

>>Peter Montgomery's dissertation, "An FFT Extension to the Elliptic Curve
>>Method of Factorization",
>>ftp://ftp.cwi.nl/pub/pmontgom/ucladissertation.psl.gz ,
>
>
> What do I need to read a psl file?

Ad far as I can see it is a regular PostScript. I don't know if the
extra "l" indicates a variant of the format, but the file behaves
normally in ghostscript/when printing.

> In addition to the remarks you go on to make, this will only be true for
> primes << the algebraic factor; intuitively I feel it ought only to
be true
> for primes < sqrt(factor) ~= x (in the case of x^6-y^6). All these
will be
> < B2.

>>However, primitive factors of x^n-y^n must be ==1 (mod n) and so the
>>factors do not behave particularly randomly at all...
>
>
> This doesn't make sense. Any primitive factor f of x^n-y^n is a
primitive
> factor of x^(kn)-y^(kn) for any k, so the above would imply that f==1
(mod
> kn) for any k.
>
> What am I missing?

A prime factor of x^n-y^n is called primitive if it divides no x^m-y^m
with 0<m<n. And thus, with x and y distinct (mod p) and neither a
multiple of p,

x^n-y^n == 0 (mod p)
x^n == y^n (mod p)
(x/y)^n == 1 (mod p)

Since p is a primitive factor, ord(x/y) = n and n|p-1.

The probability that p|(x^n-y^n) is gcd(n, p-1)/(p-1) if x and y are
chosen independently at random from the residues != 0 (mod p).
Choose some value for x and let y range over all the possible residues
(mod p). x^n and y^n both go into the subgroup of d-th powers (mod p)
where d=gcd(n, p-1) and the order of that subgroup is (p-1)/d. As y
ranges over all the possible residues (mod p), y^n visits every element in the subgroup d times and so has d matches with x^n. For a randomly chosen y the chance of a match x^n==y^n is d/(p-1), independent of x.

If we compute k different values of x^n-y^n in stage 2, the probability
of the prime p showing up at least once as a factor among them should be
1-(1-gcd(n, p-1)/(p-1))^k.

However, in the P-1 stage 2 we don't choose x and y at random from all
the possible residues (mod p). We want the analysis for primes >B2, and
x=mD < B2 and y<D. How this changes the analysis is not quite clear to
me.. at least the probability that the linear factors generate p should
be subtracted as we know that their values are <p. That gives a
probability of 1-(1-(gcd(n, p-1)-2)/(p-1))^k for odd n. As you pointed
out, the values of the other small terms may not be large enough to
justify a probability of 1/p for p dividing them, but I think the
difference will not be very great.

I have hacked up a small program to determine which primes (within a given interval) appear as factors of (mD)^n-d^n for mD<B2, gcd(d,D)=1 and given n, and the counts come out reasonably close to the predicted values. I.e. for D=2310, B2=1000000 and n=4, examining primes in [1005000, 2000000] (only the x^2+y^2 term can produce these primes), I get 9829 distinct primes that appear as factors vs an predicted 8737.26. The sum of reciprocals of the primes that appear as factors is 0.007096, vs predicted 0.006270. Curiously, the actual counts are higher than the predicted values.

For B1=100000, D=2310, n=4 and p in [10000000, 110000000], the number of primes that appear as factors is 2880 vs 2926.47 and the sum of reciprocals is 0.000111 vs predicted 0.000114.

For B2=100000, D=2310, n=6, primes in [10000000, 110000000], I get 5720 distinct primes as factors vs 5847.96 predicted, sum of reciprocals 0.000227 vs 0.000227. The predicted values seem to become more accurate for larger n and p.

These figures seem to support the gcd-2 idea, but I'd feel better if there were some theoretical grounds to it.

If we accept this estimate, the total chance of finding a factor f for P-1 with B1 and B2 bounds, Suyama's power n and k distinct (x^n-y^n) values computed in stage 2 comes out like

Pr[f-1 is B1-smooth] + (1 - Pr[f-1 is B1-smooth]) * Sum_{p>B1} 1/p * Pr[p included] * Pr[(f-1)/p, Stage1]

where Pr[p included] is 1 for p<=B2 and 1-(1-(gcd(n, p-1)-2)/(p-1))^k for p>B2.

The goal would then be to maximize the efficiency

(Pr[hit in stage 1] + Pr[hit in stage 2]) / (time for stage 1 + Pr[no hit in stage 1] * time for stage2 ).

>>This causes "clumping" of primes included
>>by Suyama's powers (some primes appearing often, others not at all) and
>>reduces their efficiency...
>
>
> Could we use this somehow? Would it be worth skipping primes in the
B1-B2
> range that were highly likely to crop up in a Suyama power?

Yes, or compute which ones do actually apprear and eliminate them from the stage 2 primes. Another idea is to exclude from stage 2 those primes that appear as factors of (x+y). For D=2310 and B2/B1=100, primes between 13 and 97 may divide (x+y) so that the cofactor is a prime in the stage 2 interval. That requires a bitfield of all the stage 2 primes that must be processed top down which makes getting the prime pairing right more difficult. I tried that once in my own P-1 factorer but it had a bug that I didn't find yet. The saving was iirc ~10% of the multiplies in stage 2.

All this can be generalised. The goal is to choose a subset of NxN (pairs of positive integers) so that the efficiency is maximal for the factor we hope to find. Of course we won't know the precise factor we want to find, but we can integrate over all remaining candidate factor weighted with their respective probability of dividing the input number.

An simple algorithm would be to take all the pairs (x,y) with x=mD for a hightly composite D and y<D, gcd(y,D)=1 (as in the current stage 2, but not only pairs where x-y is a prime), factoring all the x^n-y^n over primes >B1 (and below some other bound) and making a bipartite graph of (x,y) pairs connected with their factors. A greedy algorithm would choose the best (x,y) pair (i.e. where the sum of reciprocals of the connected prime factors maximal) and remove those prime factors and their edges from the graph. Repeat until the best remaining (x,y) pair is worse that some chosen limit.

I doubt that the greedy algo is optimal, but it should be pretty fast, except for the factoring part maybe.

The so created plan can be stored and distribued to factoring clients.

I haven't got around to implement this yet but it's on my list. It's probably quite a bit of work but it should considerably improve the effiency of stage 2.

>
> Daran

Alex

_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers

Re: Mersenne: P-1 and non k-smooth factors

Reply via email to