On Tuesday, 6 September 2016 at 14:47:21 UTC, Manu wrote:
with a main loop that reads the source buffer in *12* pixels
step, call
MySimpleKernel 3 times, then call AnotherKernel 4 times.
It's interesting thoughts. What did you do when buffers weren't
multiple of the kernels?
The end of a scan line is special cased . If I need 12 pixels for
the last iteration but there are only 8 left, an instance of
Kernel::InputVector is allocated on stack, 8 remaining pixels are
memcpy into it then send to the kernel. Output from kernel are
also assigned to a stack variable first, then memcpy 8 pixels to
the output buffer.