Re: Fastest Prime Sieve, in Nim

GordonBGood Fri, 21 Jun 2019 21:45:40 -0700

@BLM2:

I'll reply here one more time before switching to email if necessary, as a few 
peripheral issues are open and readers may want to know the outcome:


> 1) The Classical Sieve of Eratothenes (CSoE) needs to search the whole 
> integer number space up to some N (see 
> [https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes\)).
>  The Optimized SoE (OSoe) seeks to reduce the number space (by skipping even 
> number, factors of 3, 5, etc) but fundamentally still use as their 
> starting/end points the whole number space up to N.

In fact, the linked Sieve of Eratosthenes article specifically mentions use of 
wheel factorization and has tables showing the effect of "reduced number 
space"; for instance, for what you would call the "PG7" as wheel factorization 
using the prime factors of two, three, five, and seven for a range of a hundred 
billion (10^11) reduces the required number of cull operations from about 276 
billion for the "classical" SoE to about 36.2 billion operations due to 
"reduced number space" and further to about 22.3 billion culling operations for 
what you would call "PG19". The odds-only SoE is just a trivial case of "PG2" 
and reduces the number of culling operations to about 113 billion for this 
range as per the table: 
[https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Algorithmic_complexity](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Algorithmic_complexity).

The OSoEs talk nothing about Prime Generators, residues, etc.

The article doesn't talk about actual implementation methods for wheel 
factorization because that is beyond its scope of describing the SoE algorithm; 
however, it has a formula whereby one can calculate the estimated number of 
culling operations for a given range and can then confirm that estimate using 
whatever implementation one chooses. It also goes beyond your implementations 
as in using optional "pre-culling" whereby for a given "PGX" one can 
pre-initialize with a wheel pattern for even larger primes than "X" and shows 
the resulting reduction in culling operations by doing this.

> 2) My methods are fundamentally different in conception and implementation. 
> They are based on Prime Generator math...

Your methods use one of the most efficient **implementations** (perhaps the 
best in general terms) but as discussed above "Prime Generator Math" is just 
wheel factorization math as has been well known for many years in the field and 
is thus nothing unique.

> They are also modern, as they are inherently structured to be optimally done 
> in parallel.

As I mentioned in my first post, your method of doing this in parallel is just 
one possible **implementation** as is exactly the one chosen by Professor 
Caldwell in the article I quoted as he has used for perhaps the last 20 years. 
Before that, there was a 1971 paper on an implementation for a IBG 370/136 
mainframe that implemented page segmented SoE and proposed refinements for what 
is essentially wheel factorization for "reduced number space" at the end of the 
article at: 
[http://www.math.ubc.ca/~gerg/teaching/592-Fall2018/papers/1977.Bays.pdf](http://www.math.ubc.ca/~gerg/teaching/592-Fall2018/papers/1977.Bays.pdf).
 Application of wheel factorization to the SoE is well known in the art.

Your implementation of running the individual modulo residue bit planes in 
parallel is not necessarily the best one in all situations, although it may be 
the best for the specific problem of finding the twin primes over the ranges 
you propose. Running each successive segment over all modulo residue bit planes 
per thread has the advantage that it scales to any number of threads as per a 
supercomputer or compute platform such as the computation facilities of a 
modern GPU; although the code in calculation of starting addresses per segment 
is a little more complex, it doesn't then need to be done as often.

> It's interesting you raise an issue of naming my work after myself. I mean 
> after all, if you look on Wikipedia, besides the SoE, it has pages for the 
> Sieve of Atkins, even though Daniel J. Bernstein as a grad student did all 
> the programming for it 
> [https://en.wikipedia.org/wiki/Sieve_of_Atkin](https://en.wikipedia.org/wiki/Sieve_of_Atkin),
>  and Sieve of Sundaram 
> [https://en.wikipedia.org/wiki/Sieve_of_Sundaram](https://en.wikipedia.org/wiki/Sieve_of_Sundaram)...

First, the Sieve of Atkin is deservedly named after Professor Atkin because it 
is a unique sieve with an academic paper which has been defended and recognized 
as being distinctly different than the Sieve of Eratosthenes. In fact, it also 
uses "baked in" wheel factorization of what you would call "PG5" but that comes 
about naturally due to the unique algorithm based on quadratic equations. What 
the paper and reference implementation by Daniel Bernstein fail to acknowledge 
is that a practical page segmented version of the SoA is very difficult to 
realize and **does not show the stated O(N) asymptotic execution performance or 
even close** and also that **if one allows the compared SoE implementation to 
use maximum optimization including maximum wheel factorization it can never be 
faster for any practical range on any given CPU architecture**. Bernstein did 
an excellent job in his reference implementation, but other than being 
mentioned in the paper, he has nothing more to do with the SoA named after 
Atkin than if I took his code and improved it (which I have done, but it still 
can never be faster than a maximally optimized SoE due to the added 
complexities of multiple varying span  "culling" operations as compared to the 
constant span operations of SoE).

As to the Sieve of Sundaram, its only usefulness is that it is slightly simpler 
to implement than the SoE but is much slower due to culling by all odd numbers 
less than the square root of the range rather than culling by the base primes 
less than the square root of the range; if one does the slight modifications so 
as to cull only by the base primes, it is transformed into the "odds-only" SoE, 
which is a "PG2" wheel factorized sieve. Letting Sundaram's name be associated 
with the simplification can be argued as being justified as his simplification 
is a modification to the SoE algotithm, not just the implementation.

However, calling the SoA the Sieve of Bernstein due to his implementation or 
the Sieve of Goodsman if I published my improvements to his implementation is 
not appropriate because the algorithm that makes it work is entirely due to 
Professor Atkin. Kim Walisch does not call "primesieve" the Sieve of Walisch 
even though he combines many unique refinements from many sources to the basic 
Sieve of Eratosthenes algorithm (including but not limited to "PG7" wheel 
factorization in a different form, pre-culling up to 19, and many other 
"tricks"). The implementation I call Ultimate which also uses "PG7" or PG11" 
and pre-culling up to 19 or 23 does not deserve to be called the Sieve of 
Goodsman as it combines the history of improvements to the basic Sieve of 
Eratosthenes including Wheel Factorization, Page Segmentation, and the 
culminating touch that makes it faster than "primesieve" in the application of 
dividing the work into modulo residue bit planes as was used by 
Caldwell/Bernstein et all instead of packing multiple modulo residues into a 
byte as used by Walisch and others).

Thus, I don't think your implementation of the SoE with your implementation 
refinements should be named after you, but I'm not going to argue it further; 
it's up to you.

OTOH, your implementation "trick" of reducing the work done in finding the twin 
primes over a range is probably at least as unique as Sundaram's SoE 
simplification and I don't recall that being mentioned anywhere, so you can 
likely justify laying claim to that...

Re: Fastest Prime Sieve, in Nim

Reply via email to