Hello all felllow primefinders, I have followed this discussion of Haskell programs to generate small primes with the greatest interest. The thing is, I have been running my Haskell implementation of the so-called Elliptic Curve Method (ECM) of factoring for a number of years. And that method uses a sequence of small primes, precisely like the one discussed here.
I have been using the method of trial division to generate primes for the ECM. I have been aware that sieving would very likely be more efficient, but since the amount of time spent generating small primes is, after all, not an extremely large part of running the ECM program, I have postponed the task of looking into this. So this discussion of Eratostenes' sieve has provided a most wonderful opportunity to do something about this. Initially, I wanted to establish some sort of base line for generating small primes. This I did by writing my own C program for generating small primes, both using sieving and trial division. The result was most sobering: Finding primes with sieving with a C program is much faster than finding them with trial division. The speed advantage factor of sieving over trial division for small numbers (say, finding the first couple of hundred thousand primes) is around 5 and it gets larger with larger numbers. (I will not quote specific measurements here, only rough tendencies. Details clutter and Melissa is much better at this anyway. I would like to state, however, that I am, in fact, measuring performance accurately, but I use ancient machinery that runs slowly: Measuring performance on fast equipment often tends to cloud significant issues, in my experience.) So there is my goal now: Find a Haskell program that generates small primes that is around 5 times faster than the one I have already that uses trial division. And I mean a Haskell program, not a C program that gets called from Haskell. Sure, if I wanted ultimate performance, I could perhaps just call some C code from Haskell. But the insistance on pure Haskell widens the perspective and makes it more likely that the programmer or the language designer and implementer learn something valuable. The C sieving program uses an array to attain its impressive performance, so let us try that in Haskell as well. Something like this has been on my mind for a long time: accumArray (\x ->\y ->False) True (m,n) [(q,()) | q<-ps ] Haskell arrays, like every other value, cannot be changed, but accumArray provides a facility that we just might manage to use for our purpose. The particular array computed by the above expression using accumArray requires a low and high limit (m,n) of the numbers to be sieved as well as a list of sieving values - ps - numbers that need to be crossed out as composite in the interval (m,n) given. [The overhead involved in evaluating this expression with its unused () value and an indicator that is changed from True to False through throwing away two dummy values seems excessive. Any suggestion to write this up in a way that is more in agreement with the small amount of work that is actually required to do this would be most welcome. Or perhaps GHC (which is the compiler I use when I want things to run fast) takes care of things for me? In any case, this is my inner loop and even small improvements will matter.] When I tried this initially, I was actually not particularly surprised to find that it performed rather worse than trial division. Although I am unable to give a precise explanation, the GHC profiling runs indicated that, somehow, too much memory was accumulating before it was used, leading to thrashing and perhaps also a significant amount of overhead spent administering this excessive amount of memory. The initial sieve that I used had a fixed size that covered all the numbers that I wanted to test. An improvement might come about by splitting this sieve into smaller pieces that were thrown away when no longer needed. OK, what pieces? Well, a possibility would be to use the splitting of all integers into subsequences that matched the interval between squares of consecutive primes, because that would match the entry of a new prime into the set of primes that needs to be taken into consideration. And this actually worked rather well. In addition, the problem solved was now the one of expressing the generation of an infinite sequence of primes, not just the primes within some predetermined interval. Further (yes, you will get the code eventually, but be patient, I don't want to have to present more than the final version), there was the wheel idea also used by Melissa of somehow eliminating from consideration all the numbers divisible by a few of the smallest primes: If we, for example, could get away with considering only the odd numbers as candidates for being primes, this would potentially save half the work. This I didn't find to be an easy idea to implement. Essentially what I did was, both for the sequence of candidates and for the sequence of potential divisors, changing the representation into one in which consecutive numbers represented only the numbers that didn't have some basic set of primes as divisors. An example may help here. Let's say that we have selected 2 and 3 as factors to be eliminated from consideration. Then we have the numbers 1, 5, 7, 11, 13, ... that are neither divisible by 2 nor 3. And we establish a correspondence 0<->1, 1<->5, 2<->7, 3<->11, 4<->13, ... so that every number not divisible by 2 or 3 has precisely one representaton in [0..]. I call this the wheel representation. So, once we have emitted the special primes 2 and 3, we can limit our attention to the numbers whose wheel represention is the list [1..]. The situation becomes more complicated when considering the effective generation of the sequence of numbers that we use to cross off the numbers in this basic list. Taking the next prime 5, for example, we have to generate the list of divisors [25,30,..] in the wheel representation. This task is made more complicated by the fact that some of these numbers (like 30) should first be eliminated, because they are divisible by 2 or 3. The end result is a finite list of differences which can be repeated endlessly to generate the actual divisors in the wheel representation. Then there is the use of specialization, something that GHC can use to generate more efficient code for special instances of generic functions. The way I understand this, GHC can generate a special Int version of code that otherwise handles the general Integral class. But in order for this code to be selected for use by GHC, it must be called under circumstances where it is clear that the involved type is really Int. To capture the full potential of this seems to require some special handling in general. My trial division code was speeded up by a factor of about 5 using this technique. For the present sieving code, the factor was only about 2. But I believe that additional opportunities for speeding up using this technique remain. A final complication concerns the need to be able to specify a lower limit on the list of primes generated. This is for possible use of this code to generate primes for the ECM. Enough talk, the code can be found at thorkilnaur.dk/~tn/T64_20070303_1819.tar.gz and includes both the C code that I have used (ErathoS.c), a Haskell module ErathoS (ErathoS.hs), a simple driver for ErathoS (T64.hs), and a Shell script that demonstrates the basic possibilities (T64.sh). Overall, the speed compares well with the fastest of the sieves that we have seen in this discussion. And I am able to use sieving in Haskell, as in C, to gain a speed advantage over trial division. The Haskell code is still considerably slower than the C code in spite of the fact that the C code has been written mostly with an attention to correctness and could probably be improved rather easily. Best regards Thorkil _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe