On Wed, 15 Jun 2011 23:57:25 +0000, Justin Whear wrote: > Consider the following: > > You have 10 million data points and you need to apply a multipass > algorithm to them. Each pass is like a cellular automata: it can read > from the previous pass but it doesn't know the "current" values. This > makes the actual processing of each value trivially parallelizable. The > actual operation for each value is fairly simple and cheap (essentially > a multidimensional ancestor-child averaging operation). > > After each value has been operated on once, the pass is complete and the > "current" and "old" buffers are switched (conceptually, the "current" > buffer can only be written to, the "old" buffer can only be read--using > __gshared here). > > The number of passes is not fixed; in the course of each value > operation, an error is computed. When the worst individual error falls > below a certain threshold, the algorithm is finished. Generally this > will take between one thousand and ten thousand passes. > > How would you go about parallelizing this? My thought is to take the > map/reduce approach within each pass: each thread/fiber takes a slice of > the dataset, makes its modifications, then returns an error summary. > These summaries are quickly combined and the algorithm loop decides > whether to run again. Each pass shouldn't take more than a second or > two, so I'm not sure whether introducing the overhead of spawning, say, > 10 threads each pass is worthwhile (times 5000 passes). On the other > hand, I have plenty of CPUs to throw at it (at least 16 cores, each with > hyperthreading) and am in a situation where "as fast as possible" is > important (while individual datasets may not grow, the number of them > is). > > Any thoughts appreciated.
I would recommend you take a look at the new std.parallelism module, which was introduced in the most recent DMD release (2.053): http://www.d-programming-language.org/phobos-prerelease/ std_parallelism.html -Lars