On 7 September 2016 at 12:00, finalpatch via Digitalmars-d <[email protected]> wrote: > On Wednesday, 7 September 2016 at 01:38:47 UTC, Manu wrote: >> >> On 7 September 2016 at 11:04, finalpatch via Digitalmars-d >> <[email protected]> wrote: >>> >>> >>> It shouldn't be hard to have the framework look at the buffer size and >>> choose the scalar version when number of elements are small, it wasn't done >>> that way simply because we didn't need it. >> >> >> No, what's hard is working this into D's pipeline patterns seamlessly. > > > The lesson I learned from this is that you need the user code to provide a > lot of extra information about the algorithm at compile time for the > templates to work out a way to fuse pipeline stages together efficiently. > > I believe it is possible to get something similar in D because D has more > powerful templates than C++ and D also has some type introspection which C++ > lacks. Unfortunately I'm not as good on D so I can only provide some ideas > rather than actual working code. > > Once this problem is solved, the benefit is huge. It allowed me to perform > high level optimizations (streaming load/save, prefetching, dynamic > dispatching depending on data alignment etc.) in the main loop which > automatically benefits all kernels and pipelines.
Exactly!
