Hello.

I believe, my first changes to the Lensfun are ready to be discussed:
they are in the branch "vectorization" of my lensfun github repo:
https://github.com/LebedevRI/lensfun/tree/vectorization

Quick breakdown:

   - *C++11* is now required for vectorized code (sse/sse2).* (needed for
   the next change)*
      - If compiler does not support *C++11* (e.g. gcc < 4.7), *SSE*/*SSE2*
      hand-vectorized code will not be compiled.
   - There was two identical *ModifyColor_DeVignetting_PA_SSE2* functions,
   that only were different in loading/storing part (alignment issues), i
   was able to collapse them to 1 function with help of *C++11 Lambdas*.
   - Added *SSE2* postfix to *ModifyColor_DeVignetting_PA_Select* (for
   lf_f16 pixel type) - reason: that specific function is located in sse2
   file, and there may be similar functions in other files. *(needed for
   the next change)*
   - Added *ModifyColor_DeVignetting_PA_Select_SSE* (for lf_f32 pixel type)


I also have some performance numbers:​
 lensfun-perf
<https://docs.google.com/spreadsheets/d/1Kd6AWBS2V8mL50ankNKpGSZX9YOHNzYd-2Pg_x7uwlg/edit?usp=drive_web>
​(
https://docs.google.com/spreadsheets/d/1Kd6AWBS2V8mL50ankNKpGSZX9YOHNzYd-2Pg_x7uwlg/edit?usp=sharing
)

They were acquired using *darktable-cli* and *AMD CodeXL*. (lf_f32 pixel
type)

Quick performance breakdown:

   - With CMAKE_BUILD_TYPE Debug, *ModifyColor_DeVignetting_PA_Select_SSE* is
   *slower* than previous non-vectorized code.
   - For lf_f32 pixel type, there seems to be no difference in performance
   between version of code using *C++11 Lambdas* and old *two-function
   approach*.
   - With CMAKE_BUILD_TYPE Debug, *ModifyColor_DeVignetting_PA_Select_SSE* is
   *~1.4 times faster* than previous non-vectorized code.

I if there are questions - ask.

My main priority -  lf_f32.
But *if* this work merges smoothly, i *maybe* will work on other pixel
types too.

Roman.
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Lensfun-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/lensfun-users

Reply via email to