On Thursday, 11 October 2018 at 00:22:10 UTC, tide wrote:
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya
wrote:
I would like to include in my paper a good comparison of
various implementations in different compiled languages
(C/C++, D, Nim, etc) to show how it performs with each.
If you want help with your paper, possibly some kind of decent
financial incentive would be appropriate. If the algorithm
benefits from more threads than finding or creating an
implementation that runs on a GPU would probably be the true
performance test. CPUs have like 4-8 cores in the mainstream? A
GPU has hundreds, though with some limitations.
I'm writing the paper anyway (just like the others), so other
implementations are icing on the cake to show implementation
variations, as a benefit to readers. Maybe if I set up a website
and created a Rosetta Code repo for people to post their
different language implementations, and offer a T-shirt for
fastest implementation. :-)
Yes, a GPU based implementation would be the epitome for this
algorithm, by far. This is actually why I have gotten the
algorithm to this implementation so that the number crunching can
all be done in parallel threads. (It would also be screamingly
fast done in hardware in a FPGA too.) However, I only have
standard consumer grade laptops. Hopefully someone(s) with
sufficient hardware, interest, and time, will take this upon
themselves to do this and publicize their results.