[Haskell-cafe] Re: Genuine Eratosthenes sieve [Was: Optimization fun]

Melissa O'Neill Mon, 19 Feb 2007 10:14:44 -0800

Sorry, I'm a little late to this thread, but the topic of

sieve [] = []
sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x /= 0]

(as posted by Rafael Almeida) is somewhat dear to my heart, as Iwrote a paper about it last summer and sent it in to JFP. Stillwaiting for a reply though.

Let's go back to the beginning with the classic complaint and theexcuses...


Creighton Hogg wrote:

So a friend and I were thinking about making code faster inHaskell, and I was wondering if there was a way to improve thefollowing method of generating the list of all prime numbers. Ittakes about 13 seconds to run, meanwhile my friend's C version took0.1.


Here come the excuses, like this one from apfelmus,

While Haskell makes it possible to express very complicatedalgorithms in simple and elegant ways, you have to expect to pay aconstant factor (roughly 2x-10x) when competing against the samealgorithm in low-level C.

and this one from Nicolas Frisby,

I have yet to need my Haskell to perform well


Matthew Brecknell came up with something much faster, namely

primes :: [Int]
primes = 2 : filter isPrime [3,5..] where
  f x p r = x < p*p || mod x p /= 0 && r
  isPrime x = foldr (f x) True primes

FWIW, another way to write this code (without needing to think abouthow "fold bails early") is

primes = 2: [x | x <−[3,5..], all (\p −> x `mod` p > 0)(factorsToTry x)]

  where
    factorsToTry x = takeWhile (\p −> p*p <= x) primes

Both of these algorithms best the "sieve" we began with and runquickly, but as you can see (possibly more clearly from myrephrasing), this algorithm is not actually the Sieve of Eratosthenes-- it's actually a classic naive primes algorithm which checks anumber for primality by trying to divide it by every prime up to itssquare root.

But that's okay, because our initial algorithm ISN'T THE REAL SIEVEEITHER. Markus Fischmann hits the nail on the head when he says

The characteristics of a sieve is, that it uses the already foundprimes to generate a list of non-primes that is then removed from alist of candiates for primeness.

But then we get distracted by a discussion about avoiding division.It's true that the real sieve avoids division, but it is NOT truethat every algorithm that avoids division is the sieve. The threadends with this algorithm from Yitzchak Gale:

-- Delete the elements of the first list from the second list,
-- where both lists are assumed sorted and without repetition.
deleteOrd :: Ord a => [a] -> [a] -> [a]
deleteOrd xs@(x:xs') ys@(y:ys')
  | x > y       = y : deleteOrd xs  ys'
  | x < y       =     deleteOrd xs' ys
  | otherwise   =     deleteOrd xs' ys'
deleteOrd _ ys = ys

sieve (x:xs) = x : sieve (deleteOrd [x+x,x+x+x..] xs)
sieve _      = []

primes = sieve [2..]

Which seems reasonable, until you realize that every prime p we comeup with is still "considered" by k deleteOrd filters, where k is thenumber of primes that preceeded it.

So, let's recap: the original algorithm is beautiful and simple, butit is NOT the actual Sieve of Eratosthenes, NOT because it usesmodulus, but because fundamentally, at the highest level, it is adifferent algorithm. At the risk of beating a dead horse, let's seewhy it's not the real sieve.

What makes the sieve an efficient algorithm are the details of *what*gets "crossed off", *when*, and *how many times*. Suppose that we arefinding the first 100 primes (i.e., 2 through 541), and have justdiscovered that 17 is prime. We will begin crossing off at 289 (i.e.,17 * 17) and cross off the multiples 289, 306, 323, ... , 510, 527,making fifteen crossings off in total. Notice that we cross off 306(17 * 18), even though it is a multiple of both 2 and 3 and has thusalready been crossed off twice. (The starting point of 17 * 17 is apleasing, but actually *minor*, optimization for the *genuine* sieve.)

The algorithm is efficient because each composite number, c, getscrossed off f times, where f is the number of unique factors of cless than sqrt(c). The average value for f increases slowly, beingless than 3 for the first 10^12 composites, and less than 4 for thefirst 10^34.

None of the algorithms we've discussed correspond to the abovealgorithm. It is not merely that they don't perform "optimizations",such as starting at the square of the prime, or that some of then usea divisibility check rather than using a simple increment. Rather, ata fundamental level, they all work differently than the real sieve.Following our earlier example, after finding that 17 is prime, the"phony" algorithm will check to see if 19 is divisible by 17 (in thecase of Yitzchak's algorithm, divisibility is checked by comparingagainst 17*2), followed by 23, 29, 31, ... , 523, 527, checking atotal of ninety-nine numbers for divisibility by 17. In fact, even ifit did (somehow) begin at 289, it would examine all forty-fivenumbers that are not multiples of the primes prior to 17 (289, 293,307, ... , 523, 527). It is also worth realizing that the set ofnumbers examined by the phony algorithm is not a superset of the onesexamined by the real sieve (as we saw, the real algorithm crossed off306, which the phony algorithm skips). In general, the speed of thephony algorithm depends on the number of primes that are *not*factors of each composite, whereas the speed of the real algorithmdepends on the number of (unique) primes that *are*.

Hopefully at this point I've piqued your interest. Maybe you'rewondering what our original sieve algorithm is, if it isn't the realsieve and the mod issue is just a red herring...? Maybe you'rewondering what the the real sieve looks like, coded in Haskell. Ormaybe you're still not convinced about any of it. If so, I inviteyou to read a draft of my JFP paper, available at:


    http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

Best Regards,

    Melissa.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Genuine Eratosthenes sieve [Was: Optimization fun]

Reply via email to