First, here it is:
primes = 2: 3: sieve 0 primes' 5
primes' = tail primes
sieve k (p:ps) x
= [x | x<-[x,x+2..p*p-2], and [(x`rem`p)/=0 | p<-take k
primes']]
++ sieve (k+1) ps (p*p+2)
(thanks to Leon P.Smith for his brilliant idea of directly generating
the spans of odds instead of calling 'span' on the infinite odds
list).
This code is faster than PriorityQueue-based sieve (granted, using
my own
ad-hoc implementation, so YMMV) in producing the first million primes
(testing was done running the ghc -O3 compiled code inside GHCi).
The relative performance vs the PQ-version is:
100,000 300,000 1,000,000 primes produced
0.6 0.75 0.97
I've recently came to the Melissa O'Neill's article (I know, I
know) and
she makes the incredible claim there that the square-of-prime
optimization
doesn't count if we want to improve the old and "lazy"
uprimes = 2: sieve [3,5..] where
sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p > 0]
Her article gave me the strong impression that she claims that the
only way
to fix this code is by using Priority Queue based optimization, and
then
goes on to present astronomical gains in speed by implementing it.
Well, I find this claim incredible. First of all, the "naive" code
fires up
its nested filters much too early, when in fact each must be
started only
when the prime's square is reached (not only have its work started
there -
be started there itself!), so that filters are delayed:
dprimes = 2: 3: sieve (tail dprimes) [5,7..] where
sieve (p:ps) xs
= h ++ sieve ps (filter ((/=0).(`rem`p)) (tail t))
where (h,t)=span (< p*p) xs
This code right there is on par with the PQ-base code, only x3-4
slower at
generating the first million primes.
Second, the implicit control structure of linearly nested filters,
piling up
in front of the supply stream, each with its prime-number-it-
filters-by
hidden inside it, needs to be explicated into a data structure of
primes-to-filter-by (which is in fact the prefix of the primes list
we're building itself), so the filtering could be done by one
function call
instead of many nested calls:
xprimes = 2: 3: sieve 0 primes' [5,7..] where
primes' = tail xprimes
sieve k (p:ps) xs
= noDivs k h ++ sieve (k+1) ps (tail t)
where (h,t)=span (< p*p) xs
noDivs k = filter (\x-> all ((/=0).(x`rem`)) (take k primes'))
From here, using the brilliant idea of Leon P. Smith's of fusing
the span
and the infinite odds supply, we arrive at the final version,
kprimes = 2: 3: sieve 0 primes' 5 where
primes' = tail kprimes
sieve k (p:ps) x
= noDivs k h ++ sieve (k+1) ps (t+2)
where (h,t)=([x,x+2..t-2], p*p)
noDivs k = filter (\x-> all ((/=0).(x`rem`)) (take k primes'))
Using the list comprehension syntax, it turns out, when compiled with
ghc -O3, gives it another 5-10% speedup itself.
So I take it to disprove the central premise of the article, and to
show
that simple lazy functional FAST primes code does in fact exist, and
that the PQ optimization - which value of course no-one can dispute
- is
a far-end optimization.
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe