Lot to digest here.
1. I am aware of this. I am just scaffolding the library. Just to avoid doing
something fundamentally wrong (like reaching 80fps instead of >1000fps ;oP). In
the future, the library should be more general. In this case, I take advance
that it is always 1 byte per sample. To be honest I don't know why it is using
float in the C++ version. The
[API](http://www.vapoursynth.com/doc/api/vapoursynth.h.html#getreadptr) states
that both the read and write pointer are uint8. Doing floating point vector is
something that I should try in the future.
2. I bit like chinese for me :O/. I understand that the left column is the C
generated from Nim, and the rigth column is the assembler generated. The
bottleneck is because of the 22s for the row1? I am surprised. At the end of
the day, I first iterate on rows and then in columns.
I used both strategies as can be seen here:
proc `[]`*(frame:ptr VSFrameRef, plane:cint ):Plane =
let ini = frame.getPtr(plane)
let stride = frame.stride(plane)
return Plane(ini:ini,stride:stride)
Run
I will give it another shot as you suggest.
Regarding the triple iteration, I realized that, but I don't want to change
that (yet) until I get similar performance to the C++ version with similar
algorithm.
I tried to perform a filter with
[ArrayMancer](https://github.com/mantielero/VapourSynth.nim/blob/master/src/filters/Mancer.nim)
but I got worst.
My short term objective is trying to reach in the order of the 3000fps using
float32 and multithreading (keeping similar to the C++ algorithm).
Next step, to eliminate the triple loop on the rows and to see if I can reduce
the number of times that I call getStride and getPtr.
In the long term I'd like to try vectorization.