Re: A Friendly Challenge for D

Jabari Zakiya via Digitalmars-d Mon, 15 Oct 2018 13:51:01 -0700

I don't actually understand the underlying algorithm, but I atleast understand the flow of the program and the structure.The algorithm utilized depends heavily on using shared memoryaccess, which can be done in D, but I definitely wouldn't callit idiomatic. In D, message passing is preferred, but itreally can't be well demonstrated on your algorithm without adeeper understanding of the algorithm itself.
A complete working version can be found at:https://gist.github.com/vnayar/79e2d0a9850833b8859dd9f08997b4d7
I modified both versions of the program to utilize thepgParameters13 for more of an apples-to-apples comparison.
The final results are as follows:
$ nim c --cc:gcc --d:release --threads:on twinprimes_ssoz.nim&& echo "3000000000" | ./twinprimes_ssoz
Enter integer number: threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 0.000 secs
perform twinprimes ssoz sieve
sieve time = 0.222 secs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 2999999712+/-1
total time = 0.223 secs
$ dub build --compiler=ldc2 -b=release && echo "3000000000" |./twinprimes
Enter integer number:
threads = 8
each thread segment is [1 x 65536] bytes array
twinprime candidates = 175324676; resgroups = 1298702
each 135 threads has nextp[2 x 5566] array
setup time = 1 ms, 864 μs, and 7 hnsecs
perform twinprimes ssoz sieve
sieve time = 196 ms, 566 μs, and 5 hnsecs
last segment = 53518 resgroups; segment slices = 20
total twins = 9210144; last twin = 2999999712+/- 1
total time = 198 ms, 431 μs, and 2 hnsecs
My understanding is that the difference in performance islargely due to slightly better optimization from the LLVM basedldc2 compiler, where I believe Nim is using gcc.


Hey Great!  I'll take some time and study your code.

Nim currently allows using gcc or clang: --cc:gcc or --cc:clang,and

compiles to C: nim c --cc:gcc... or C++: nim c++ --cc:gcc...

You can compile in Nim with/out using garbage collection (GC) andchoose GC.This version needs GC to recover mem created in each thread, orit will eatit up. See operation of program using htop/top to see threads|memat work.

If D has a fast bitarray structure, try it to create the dataarrays"prms" in proc "sozpg" and "seg" arrays is "twin_sieves". Thiscould allowfor larger segment sizes|arrays that fit into cpu cache, reducingthe numberof segment slices to process. This would all have to be profiledandencapsulated in "selectPG", which adaptively selects to best PGto use.


Here is a table of output data and performance times (in seconds).

The results were produced on a System76 laptop with an Intel I76700HQ cpu,2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. I compiledthe file"twinprimes_ssoz.nim" with Nim versions 0.18.0, and the justreleased 0.19.0,using gcc 8.2.1 with Manjaro (Linux) in Virtual Box. I then ranthe executableson my host Linux system (which only has gcc-4.9.8) to use all 8threads.

The output was produced on a "quiet" system, where I turned offthe laptop andrebooted, and ran tests with only a terminal and spreedsheet open(to record

results). The times are the "best" times from multiple runs.

I also compare times from the program "primesieve" -https://primesieve.org/

which states on its site:

------------------------------------------------------------------------------------

primesieve is a free software program and C/C++ library thatgenerates primes using a highly optimized sieve of Eratosthenesimplementation. It counts the primes below 10^10 in just 0.45seconds on an Intel Core i7-6700 CPU (4 x 3.4 GHz). primesievecan generate primes and prime k‑tuplets up to 2^64.

------------------------------------------------------------------------------------

(An optimized C/C++ versions of twinprimes_ssoz would be nice tocompare against too.)

Number | Twimprimes | Last Twinprime |twinprime_ssoz.nim,gcc-8.2.1| primesieve || | | 0.18.0 |0.19.0 | 9.7.1 |

----------|------------|----------------|--------------|--------------|------------|

1 x 10^09 | 3424506 | 1999999872 | 0.043 |0.041 | 0.049 |

----------|------------|----------------|--------------|--------------|------------|

5 x 10^09 | 14618166 | 4999999860 | 0.219 |0.226 | 0.248 |

----------|------------|----------------|--------------|--------------|------------|

1 x 10^10 | 27412679 | 9999999702 | 0.398 |0.401 | 0.533 |

----------|------------|----------------|--------------|--------------|------------|

5 x 10^10 | 118903682 | 49999999590 | 2.364 |2.331 | 2.978 |

----------|------------|----------------|--------------|--------------|------------|

1 x 10^11 | 224376048 | 99999999762 | 4.586 |4.476 | 6.189 |

----------|------------|----------------|--------------|--------------|------------|

5 x 10^11 | 986222314 | 499999999062 | 26.625 |26.578 | 35.120 |

----------|------------|----------------|--------------|--------------|------------|

1 x 10^12 | 1870585220 | 999999999960 | 57.895 |58.195 | 73.966 |

----------|------------|----------------|--------------|--------------|------------|

5 x 10^12 | 8312493003 | 4999999999878 | 385.562 |389.347 | 433.320 |

----------|------------|----------------|--------------|--------------|------------|

1 x 10^13 |15834664872 | 9999999998490 | 855.705 |857.771 | 934.253 |

----------|------------|----------------|--------------|--------------|------------|

2 x 10^13 |30198862775 | 19999999999830 | 1806.282 |1818.766 | 1998.341 |

----------|------------|----------------|--------------|--------------|------------|

A typical output of the "twinprimes_ssoz.nim" files looks asbelow"


$ echo 123_345_789_000 | ./twinprimes_test7yc.0180.gcc821
Enter integer number: threads = 8

each thread segment is [1 x 98304] bytes array // 98,304segment size (Kn)twinprime candidates = 6099517038; resgroups = 4107419 // maxtwins; max cols (KB)each 1485 threads has nextp[2 x 30062] array //30,062, primes <= sqrt(N)setup time = 0.006 secs // timefor primes|ssoz setupperform twinprimes ssoz sieve // ssozstarts nowsieve time = 5.771 secs // timeto do just the ssozlast segment = 76955 resgroups; segment slices = 42 // lastseg cols; seg loopstotal twins = 272018910; last twin = 123345788640+/-1 //twinprimes, val for lasttotal time = 5.778 secs // setup+ ssoz time

You can determine the selected PG by the number of threads beingused.

I'm writing up the paper, and will probably release the programoperation descriptionto provide a detailed understanding of how|why it works, whichwill allow for modification.

Re: A Friendly Challenge for D

Reply via email to