On 5 September 2016 at 20:32, Johannes Pfau via Digitalmars-d <[email protected]> wrote: > Am Mon, 5 Sep 2016 10:21:53 +0200 > schrieb Andrei Alexandrescu <[email protected]>: > >> On 9/5/16 7:08 AM, Manu via Digitalmars-d wrote: >> > I mostly code like this now: >> > data.map!(x => transform(x)).copy(output); >> > >> > It's convenient and reads nicely, but it's generally inefficient. >> >> What are the benchmarks and the numbers? What loss are you looking >> at? -- Andrei > > As Manu posted this question (and he's working on a color/image library) > it's not hard to guess one problem is SIMD/vectorization. E.g if > transform(x) => x + 2; It is faster to perfom 1 SIMD operation on 4 > values instead of 4 individual adds. > > As autovectorization is not very powerful in current compilers I can > easily imagine that complex range based examples can't compete with > hand-written SIMD loops. > > @Manu: Have you had a look at std.parallelism? I think it has some kind > of parallel map which could provide some inspiration?
I have, but even just chunks() and joiner() can do the trick to a reasonable extent, but it's still not great. It's definitely not where I'd like it to be. End-users won't manually deploy these strategies correctly (or at all), I'd like to see design that enables more automatic deployment of batch processing. I treat the end-user like a javascript user; they shouldn't need to do hard work to make proper use of a lib, that's poor API offering on part of the lib author.
