Hello.
I believe, my first changes to the Lensfun are ready to be discussed:
they are in the branch "vectorization" of my lensfun github repo:
https://github.com/LebedevRI/lensfun/tree/vectorization
Quick breakdown:
- *C++11* is now required for vectorized code (sse/sse2).* (needed for
the next change)*
- If compiler does not support *C++11* (e.g. gcc < 4.7), *SSE*/*SSE2*
hand-vectorized code will not be compiled.
- There was two identical *ModifyColor_DeVignetting_PA_SSE2* functions,
that only were different in loading/storing part (alignment issues), i
was able to collapse them to 1 function with help of *C++11 Lambdas*.
- Added *SSE2* postfix to *ModifyColor_DeVignetting_PA_Select* (for
lf_f16 pixel type) - reason: that specific function is located in sse2
file, and there may be similar functions in other files. *(needed for
the next change)*
- Added *ModifyColor_DeVignetting_PA_Select_SSE* (for lf_f32 pixel type)
I also have some performance numbers:
lensfun-perf
<https://docs.google.com/spreadsheets/d/1Kd6AWBS2V8mL50ankNKpGSZX9YOHNzYd-2Pg_x7uwlg/edit?usp=drive_web>
(
https://docs.google.com/spreadsheets/d/1Kd6AWBS2V8mL50ankNKpGSZX9YOHNzYd-2Pg_x7uwlg/edit?usp=sharing
)
They were acquired using *darktable-cli* and *AMD CodeXL*. (lf_f32 pixel
type)
Quick performance breakdown:
- With CMAKE_BUILD_TYPE Debug, *ModifyColor_DeVignetting_PA_Select_SSE* is
*slower* than previous non-vectorized code.
- For lf_f32 pixel type, there seems to be no difference in performance
between version of code using *C++11 Lambdas* and old *two-function
approach*.
- With CMAKE_BUILD_TYPE Debug, *ModifyColor_DeVignetting_PA_Select_SSE* is
*~1.4 times faster* than previous non-vectorized code.
I if there are questions - ask.
My main priority - lf_f32.
But *if* this work merges smoothly, i *maybe* will work on other pixel
types too.
Roman.
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Lensfun-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/lensfun-users