Was false sharing a factor, or does you paralellization mechanism take care of that?
Sent from my iPhone On Aug 27, 2011, at 6:06 AM, dsimcha <[email protected]> wrote: > On 8/27/2011 5:37 AM, Don wrote: >> dsimcha wrote: >>> == Quote from Rainer Schuetze ([email protected])'s article >>>> The lexer used by Visual D is also CTFE capable: >>>> http://www.dsource.org/projects/visuald/browser/trunk/vdc/lexer.d >>>> As Timon pointed out, it will separate into D tokens, not the more >>>> combined elements in your array. >>>> Here's my small CTFE test: >>> >>> Thanks, but I've come to the conclusion that this lexer is way too big a >>> dependency for something as small as parallel array ops, unless it >>> were to be >>> integrated into Phobos by itself. I'll just stick with the ugly syntax. >>> Unfortunately, according to my benchmarks array ops may be so memory >>> bandwidth-bound that parallelization doesn't yield very good speedups >>> anyhow. >> >> Totally. Anything below BLAS3 is memory-limited, not CPU limited. Even >> then, cache prefetching has as big an impact as number of processors. > > I think the "memory bandwidth-bound" statement actually applies to a lot of > what I tried to do in std.parallel_algorithm. Much of it shows > far-below-linear speedups, but it can't be explained by communication > overhead because the speedup relative to the serial algorithm doesn't improve > when I make the problem and work unit sizes huge.
