On Tuesday, 6 September 2016 at 14:47:21 UTC, Manu wrote:

with a main loop that reads the source buffer in *12* pixels step, call
MySimpleKernel 3 times, then call AnotherKernel 4 times.

It's interesting thoughts. What did you do when buffers weren't multiple of the kernels?

The end of a scan line is special cased . If I need 12 pixels for the last iteration but there are only 8 left, an instance of Kernel::InputVector is allocated on stack, 8 remaining pixels are memcpy into it then send to the kernel. Output from kernel are also assigned to a stack variable first, then memcpy 8 pixels to the output buffer.

Reply via email to