On Sunday, 26 December 2021 at 06:10:03 UTC, Era Scarecrow wrote:
This is curious. I was up for trying to parallelize my code,
specifically having a block of code calculate some polynomials
(*Related to Reed Solomon stuff*). So I cracked open
std.parallel and looked over how I would manage this all.
To my surprise I found ParallelForEach, which gives the
example of:
```d
foreach(value; taskPool.parallel(range) ){code}
```
Since my code doesn't require any memory management, shared
resources or race conditions (*other than stdout*), I plugged
in an iota and gave it a go. To my amazement no compiling
issues, and all my cores are in heavy use and it's outputting
results!
Now said results are out of order (*and early results are
garbage from stdout*), but I'd included a bitwidth comment so
sorting should be easy.
```d
0x3, /*7*/
0x11, /*9*/
0x9, /*10*/
0x1D, /*8*/
0x5, /*11*/
0x3, /*15*/
0x53, /*12*/
0x1B, /*13*/
0x2B, /*14*/
```
etc etc.
Previously years ago I remember having to make a struct and
then having to pass a function and a bunch of stuff from within
the struct, often breaking and being hard to get to even work
so I didn't hardly touch this stuff. This is making outputting
data MUCH faster and so easily; Well at least on a beefy
computer and not just some chromebook I'm programming on so it
can all be on the go.
So I suppose, is there anything I need to know? About shared
resources or how to wait until all threads are done?
Parallel programming is one of the deepest rabbit holes you can
actually get to use in practice. Your question at the moment
doesn't really have much context to it so it's difficult to
suggest where you should go directly.
I would start by removing the use of stdout in your loop kernel -
I'm not familiar with what you are calculating, but if you can
basically have the (parallel) loop operate from (say) one array
directly into another then you can get extremely good parallel
scaling with almost no effort.
Not using in the actual loop should make the code faster even
without threads because having a function call in the hot code
will mean compilers optimizer will give up on certain
transformations - i.e. do all the work as compactly as possible
then output the data in one step at the end.