Thanks for taking time to run the code. First I'll comment on some of the 
technical issues you raised, then the others (not in order).

`Your results are as valid for AMD or other processors as they are for Intel...`

Good. I'm glad to see it's seems consistent across different cpu systems.

`` It is just a Sieve of Eratosthenes with a high degree of wheel factorization 
applied using residue bit planes as has been well known, at least by Berstein 
of Berstein and Atkin fame... `` Actually no. The algorithm attributed to 
Eratosthenes (more ancient societies/cultures knew how to identify primes way 
before him) is conceptually fundamentally different.

1) The Classical Sieve of Eratothenes (CSoE) needs to search the whole integer 
number space up to some N (see 
[https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes\)).
 The Optimized SoE (OSoe) seeks to reduce the number space (by skipping even 
number, factors of 3, 5, etc) but fundamentally still use as their starting/end 
points the whole number space up to N. The OSoEs talk nothing about Prime 
Generators, residues, etc.

2) My methods are fundamemtally different in conception and implementation. 
They are based on Prime Generator math. You always start with a greatly reduced 
number space that contains all the primes up to N, so you always inherently do 
less work (sieving prime multiples) to identify the primes. They are also 
modern, as they are inherently structuctured to be optimally done in parallel.

I am not aware of anything in writing that comes close to describing the 
totality of an algorithm similar to mine, and certainly have not seen any coded 
implementation of such an algorithm. I would urge you to read my prior papers 
cited as references.

`` However, you are right that it is a great algorithm and something that Kim 
Walisch of primesieve missed as he persisted in byte packing the PG5 residues 
into a byte "because it works out - eight values in eight bits" and missed that 
by so doing he is reducing the effective range of each CPU L1 cache range with 
a subsequent loss of efficiency. ``

I first emailed Kim my "The Segmente Sieve of Zakiya (SSoZ)" when released in 
2014 (7 years ago), and subsequent papers. He knows about my work, and if you 
start looking at his iterations of Primesieve you'll see it's incorporated 
aspects of it. And that's good! I was trying to make him aware of how to do his 
sieve better (and get him to do a C++ version of mine).

But doing wheel factorization is just that, a means to reduce the number space 
starting from the total number space up to N. The optimum approach is to stay 
strictly within the number space of the Prime Generators, which is optimal by 
their structure for their given size.

`` Now, I am in process of writing a Sieve of Eratosthenes that uses all of the 
optimizations I mention here and more that should be up to about 20% faster 
than primesieve even for just counting primes, and if I adapted your twin 
primes technique to that it would also find twin primes much much faster.

It's also easy to see the optimizations that could make your code about twice 
as fast as it is currently... ``

Great! I look forward to seeing your code, and benchmark results.

There are two straightforward ways to further optimize my implementation: 1) 
Optimize the tuning algorithm to select the best segment size and PG for a 
given N or range, and 2) Give it more threads! In fact, this algorithm screams 
out for GPU implementations (Cuda/NVIDIA, Opencl, etc), and distributed 
networks (clouds).

Ok, now to address your non-technical comments.

`` While it's great that you are doing innovative things with Nim as your 
chosen language, naming this sieving algorithm after yourself as something 
unique is stretching things... ``

First off, I became aware of what I call Prime Generators in 2008. All my 
original development coding was done in Ruby (which I still use to prototype 
with). Then in 2014 I (by default) used C++ to do the Segmented Sieve of Zakiya 
(SSoZ), to be able to (back then on my laptop) use OpenMP for parallel 
processing. When I learned about Nim, and saw it could do (actual) 
multi-threaded processing, I began learning and using it. Its so much simpler 
and fun than C++ (much better syntax). So this has been an 11 year journey (so 
far) in continual learning.

It's interesting you raise an issue of naming my work after myself. I mean 
after all, if you look on wikipedia, besides the Soe, it has pages for the 
Sieve of Atkins, even though Daniel J. Bernstein as a grad student did all the 
programming for it 
([https://en.wikipedia.org/wiki/Sieve_of_Atkin)](https://en.wikipedia.org/wiki/Sieve_of_Atkin\)),
 and Sieve of Sundaram 
([https://en.wikipedia.org/wiki/Sieve_of_Sundaram)](https://en.wikipedia.org/wiki/Sieve_of_Sundaram\)).
 It was just logical to name the techniques after me so people knew who to 
contact about it.

But, again, thanks for being interested enough to read the paper and run the 
code. If you have any other questions or suggetions I'd be happy to hear them. 
I'm very interested in other innovative and imaginative implementations.

Reply via email to