Lot to digest here.

  1. I am aware of this. I am just scaffolding the library. Just to avoid doing 
something fundamentally wrong (like reaching 80fps instead of >1000fps ;oP). In 
the future, the library should be more general. In this case, I take advance 
that it is always 1 byte per sample. To be honest I don't know why it is using 
float in the C++ version. The 
[API](http://www.vapoursynth.com/doc/api/vapoursynth.h.html#getreadptr) states 
that both the read and write pointer are uint8. Doing floating point vector is 
something that I should try in the future.
  2. I bit like chinese for me :O/. I understand that the left column is the C 
generated from Nim, and the rigth column is the assembler generated. The 
bottleneck is because of the 22s for the row1? I am surprised. At the end of 
the day, I first iterate on rows and then in columns.



I used both strategies as can be seen here:
    
    
    proc `[]`*(frame:ptr VSFrameRef, plane:cint ):Plane =
      let ini = frame.getPtr(plane)
      let stride = frame.stride(plane)
      return Plane(ini:ini,stride:stride)
    
    Run

I will give it another shot as you suggest.

Regarding the triple iteration, I realized that, but I don't want to change 
that (yet) until I get similar performance to the C++ version with similar 
algorithm.

I tried to perform a filter with 
[ArrayMancer](https://github.com/mantielero/VapourSynth.nim/blob/master/src/filters/Mancer.nim)
 but I got worst.

My short term objective is trying to reach in the order of the 3000fps using 
float32 and multithreading (keeping similar to the C++ algorithm).

Next step, to eliminate the triple loop on the rows and to see if I can reduce 
the number of times that I call getStride and getPtr.

In the long term I'd like to try vectorization.

Reply via email to