Re: Fastest Prime Sieve, in Nim

BLM2 Fri, 21 Jun 2019 14:00:51 -0700

Thanks for taking time to run the code. First I'll comment on some of the 
technical issues you raised, then the others (not in order).

`Your results are as valid for AMD or other processors as they are for Intel...`

Good. I'm glad to see it's seems consistent across different cpu systems.

`` It is just a Sieve of Eratosthenes with a high degree of wheel factorization
applied using residue bit planes as has been well known, at least by Berstein
of Berstein and Atkin fame... `` Actually no. The algorithm attributed to
Eratosthenes (more ancient societies/cultures knew how to identify primes way
before him) is conceptually fundamentally different.

1) The Classical Sieve of Eratothenes (CSoE) needs to search the whole integer
number space up to some N (see
[https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes\)).
The Optimized SoE (OSoe) seeks to reduce the number space (by skipping even
number, factors of 3, 5, etc) but fundamentally still use as their starting/end
points the whole number space up to N. The OSoEs talk nothing about Prime
Generators, residues, etc.

2) My methods are fundamemtally different in conception and implementation.
They are based on Prime Generator math. You always start with a greatly reduced
number space that contains all the primes up to N, so you always inherently do
less work (sieving prime multiples) to identify the primes. They are also
modern, as they are inherently structuctured to be optimally done in parallel.

I am not aware of anything in writing that comes close to describing the
totality of an algorithm similar to mine, and certainly have not seen any coded
implementation of such an algorithm. I would urge you to read my prior papers
cited as references.

`` However, you are right that it is a great algorithm and something that Kim
Walisch of primesieve missed as he persisted in byte packing the PG5 residues
into a byte "because it works out - eight values in eight bits" and missed that
by so doing he is reducing the effective range of each CPU L1 cache range with
a subsequent loss of efficiency. ``

I first emailed Kim my "The Segmente Sieve of Zakiya (SSoZ)" when released in
2014 (7 years ago), and subsequent papers. He knows about my work, and if you
start looking at his iterations of Primesieve you'll see it's incorporated
aspects of it. And that's good! I was trying to make him aware of how to do his
sieve better (and get him to do a C++ version of mine).

But doing wheel factorization is just that, a means to reduce the number space
starting from the total number space up to N. The optimum approach is to stay
strictly within the number space of the Prime Generators, which is optimal by
their structure for their given size.

`` Now, I am in process of writing a Sieve of Eratosthenes that uses all of the
optimizations I mention here and more that should be up to about 20% faster
than primesieve even for just counting primes, and if I adapted your twin
primes technique to that it would also find twin primes much much faster.

It's also easy to see the optimizations that could make your code about twice
as fast as it is currently... ``

Great! I look forward to seeing your code, and benchmark results.

There are two straightforward ways to further optimize my implementation: 1)
Optimize the tuning algorithm to select the best segment size and PG for a
given N or range, and 2) Give it more threads! In fact, this algorithm screams
out for GPU implementations (Cuda/NVIDIA, Opencl, etc), and distributed
networks (clouds).

Ok, now to address your non-technical comments.

`` While it's great that you are doing innovative things with Nim as your
chosen language, naming this sieving algorithm after yourself as something
unique is stretching things... ``

First off, I became aware of what I call Prime Generators in 2008. All my
original development coding was done in Ruby (which I still use to prototype
with). Then in 2014 I (by default) used C++ to do the Segmented Sieve of Zakiya
(SSoZ), to be able to (back then on my laptop) use OpenMP for parallel
processing. When I learned about Nim, and saw it could do (actual)
multi-threaded processing, I began learning and using it. Its so much simpler
and fun than C++ (much better syntax). So this has been an 11 year journey (so
far) in continual learning.

It's interesting you raise an issue of naming my work after myself. I mean
after all, if you look on wikipedia, besides the Soe, it has pages for the
Sieve of Atkins, even though Daniel J. Bernstein as a grad student did all the
programming for it
([https://en.wikipedia.org/wiki/Sieve_of_Atkin)](https://en.wikipedia.org/wiki/Sieve_of_Atkin\)),
and Sieve of Sundaram
([https://en.wikipedia.org/wiki/Sieve_of_Sundaram)](https://en.wikipedia.org/wiki/Sieve_of_Sundaram\)).
It was just logical to name the techniques after me so people knew who to
contact about it.

But, again, thanks for being interested enough to read the paper and run the
code. If you have any other questions or suggetions I'd be happy to hear them.
I'm very interested in other innovative and imaginative implementations.

Re: Fastest Prime Sieve, in Nim

Reply via email to