Thanks for taking time to run the code. First I'll comment on some of the technical issues you raised, then the others (not in order).
`Your results are as valid for AMD or other processors as they are for Intel...` Good. I'm glad to see it's seems consistent across different cpu systems. `` It is just a Sieve of Eratosthenes with a high degree of wheel factorization applied using residue bit planes as has been well known, at least by Berstein of Berstein and Atkin fame... `` Actually no. The algorithm attributed to Eratosthenes (more ancient societies/cultures knew how to identify primes way before him) is conceptually fundamentally different. 1) The Classical Sieve of Eratothenes (CSoE) needs to search the whole integer number space up to some N (see [https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes)](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes\)). The Optimized SoE (OSoe) seeks to reduce the number space (by skipping even number, factors of 3, 5, etc) but fundamentally still use as their starting/end points the whole number space up to N. The OSoEs talk nothing about Prime Generators, residues, etc. 2) My methods are fundamemtally different in conception and implementation. They are based on Prime Generator math. You always start with a greatly reduced number space that contains all the primes up to N, so you always inherently do less work (sieving prime multiples) to identify the primes. They are also modern, as they are inherently structuctured to be optimally done in parallel. I am not aware of anything in writing that comes close to describing the totality of an algorithm similar to mine, and certainly have not seen any coded implementation of such an algorithm. I would urge you to read my prior papers cited as references. `` However, you are right that it is a great algorithm and something that Kim Walisch of primesieve missed as he persisted in byte packing the PG5 residues into a byte "because it works out - eight values in eight bits" and missed that by so doing he is reducing the effective range of each CPU L1 cache range with a subsequent loss of efficiency. `` I first emailed Kim my "The Segmente Sieve of Zakiya (SSoZ)" when released in 2014 (7 years ago), and subsequent papers. He knows about my work, and if you start looking at his iterations of Primesieve you'll see it's incorporated aspects of it. And that's good! I was trying to make him aware of how to do his sieve better (and get him to do a C++ version of mine). But doing wheel factorization is just that, a means to reduce the number space starting from the total number space up to N. The optimum approach is to stay strictly within the number space of the Prime Generators, which is optimal by their structure for their given size. `` Now, I am in process of writing a Sieve of Eratosthenes that uses all of the optimizations I mention here and more that should be up to about 20% faster than primesieve even for just counting primes, and if I adapted your twin primes technique to that it would also find twin primes much much faster. It's also easy to see the optimizations that could make your code about twice as fast as it is currently... `` Great! I look forward to seeing your code, and benchmark results. There are two straightforward ways to further optimize my implementation: 1) Optimize the tuning algorithm to select the best segment size and PG for a given N or range, and 2) Give it more threads! In fact, this algorithm screams out for GPU implementations (Cuda/NVIDIA, Opencl, etc), and distributed networks (clouds). Ok, now to address your non-technical comments. `` While it's great that you are doing innovative things with Nim as your chosen language, naming this sieving algorithm after yourself as something unique is stretching things... `` First off, I became aware of what I call Prime Generators in 2008. All my original development coding was done in Ruby (which I still use to prototype with). Then in 2014 I (by default) used C++ to do the Segmented Sieve of Zakiya (SSoZ), to be able to (back then on my laptop) use OpenMP for parallel processing. When I learned about Nim, and saw it could do (actual) multi-threaded processing, I began learning and using it. Its so much simpler and fun than C++ (much better syntax). So this has been an 11 year journey (so far) in continual learning. It's interesting you raise an issue of naming my work after myself. I mean after all, if you look on wikipedia, besides the Soe, it has pages for the Sieve of Atkins, even though Daniel J. Bernstein as a grad student did all the programming for it ([https://en.wikipedia.org/wiki/Sieve_of_Atkin)](https://en.wikipedia.org/wiki/Sieve_of_Atkin\)), and Sieve of Sundaram ([https://en.wikipedia.org/wiki/Sieve_of_Sundaram)](https://en.wikipedia.org/wiki/Sieve_of_Sundaram\)). It was just logical to name the techniques after me so people knew who to contact about it. But, again, thanks for being interested enough to read the paper and run the code. If you have any other questions or suggetions I'd be happy to hear them. I'm very interested in other innovative and imaginative implementations.
