On 2011-06-15 16:57, Justin Whear wrote: > Consider the following: > > You have 10 million data points and you need to apply a multipass algorithm > to them. Each pass is like a cellular automata: it can read from the > previous pass but it doesn't know the "current" values. This makes the > actual processing of each value trivially parallelizable. The actual > operation for each value is fairly simple and cheap (essentially a > multidimensional ancestor-child averaging operation). > > After each value has been operated on once, the pass is complete and the > "current" and "old" buffers are switched (conceptually, the "current" > buffer can only be written to, the "old" buffer can only be read--using > __gshared here). > > The number of passes is not fixed; in the course of each value operation, > an error is computed. When the worst individual error falls below a > certain threshold, the algorithm is finished. Generally this will take > between one thousand and ten thousand passes. > > How would you go about parallelizing this? My thought is to take the > map/reduce approach within each pass: each thread/fiber takes a slice of > the dataset, makes its modifications, then returns an error summary. These > summaries are quickly combined and the algorithm loop decides whether to > run again. Each pass shouldn't take more than a second or two, so I'm not > sure whether introducing the overhead of spawning, say, 10 threads each > pass is worthwhile (times 5000 passes). On the other hand, I have plenty > of CPUs to throw at it (at least 16 cores, each with hyperthreading) and > am in a situation where "as fast as possible" is important (while > individual datasets may not grow, the number of them is). > > Any thoughts appreciated.
You should take a look at std.parallelism. - Jonathan M Davis