On 7 September 2016 at 07:11, finalpatch via Digitalmars-d <[email protected]> wrote: > On Tuesday, 6 September 2016 at 14:47:21 UTC, Manu wrote: > >>> with a main loop that reads the source buffer in *12* pixels step, call >>> MySimpleKernel 3 times, then call AnotherKernel 4 times. >> >> >> It's interesting thoughts. What did you do when buffers weren't multiple >> of the kernels? > > > The end of a scan line is special cased . If I need 12 pixels for the last > iteration but there are only 8 left, an instance of Kernel::InputVector is > allocated on stack, 8 remaining pixels are memcpy into it then send to the > kernel. Output from kernel are also assigned to a stack variable first, then > memcpy 8 pixels to the output buffer.
Right, and this is a classic problem with this sort of function; it is only more efficient if numElements is suitable long. See, I often wonder if it would be worth being able to provide both functions, a scalar and array version, and have the algorithms select between them intelligently.
