>
> - There was two identical *ModifyColor_DeVignetting_PA_SSE2* functions,
>> that only were different in loading/storing part (alignment issues), i
>> was able to collapse them to 1 function with help of *C++11 Lambdas*.
>
> Perfect!

Now that i have looked at mod-coord-sse.cpp
<https://github.com/LebedevRI/lensfun/blob/master/libs/lensfun/mod-coord-sse.cpp>,
and it only works on properly aligned buffers, otherwise falling-back to
plain non-SSE code.
So it *might* make sense to do the same for mod-color-sse2.cpp
<https://github.com/LebedevRI/lensfun/blob/vectorization/libs/lensfun/mod-color-sse2.cpp>
too
*AND properly document that!*
But i guess that decision is up to you.

On Thu, Oct 9, 2014 at 10:55 PM, Sebastian Kraft <[email protected]>
wrote:

> Hi Roman,
>
> thanks for your work on the SSE implementation, looks great!
>
> >
> >    - *C++11* is now required for vectorized code (sse/sse2).* (needed for
> >    the next change)*
> >       - If compiler does not support *C++11* (e.g. gcc < 4.7),
> *SSE*/*SSE2*
> >       hand-vectorized code will not be compiled.
>
> Do you know if this works with Visual Studio? There are some people
> using VS2013 to compile lensfun.
>
> >    - There was two identical *ModifyColor_DeVignetting_PA_SSE2*
> functions,
> >    that only were different in loading/storing part (alignment issues), i
> >    was able to collapse them to 1 function with help of *C++11 Lambdas*.
>
> Perfect!
>
> >    - With CMAKE_BUILD_TYPE Debug,
> *ModifyColor_DeVignetting_PA_Select_SSE* is
> >    *slower* than previous non-vectorized code.
>
> That may happen...
>
> >    - For lf_f32 pixel type, there seems to be no difference in
> performance
> >    between version of code using *C++11 Lambdas* and old *two-function
> >    approach*.
> >    - With CMAKE_BUILD_TYPE Debug,
> *ModifyColor_DeVignetting_PA_Select_SSE* is
> >    *~1.4 times faster* than previous non-vectorized code.
> >
>
> Looks good. Did you compare the hand optimized SSE code with normal code
> and -O3 optimization?
>
> Currently I am restructuring the source tree a bit and also started to
> implement a test suite for lensfun. I will post more information on this
> probably at the weekend. Would be great if we could also add some
> performance testing to see where further bottle necks are hidden in the
> code. And we should add some tests to verify that SSE code and normal
> code both give accurate results. Maybe you can help to write tests for
> the vignetting part?
>
> Sebastian
>
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> Lensfun-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/lensfun-users
>
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Lensfun-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/lensfun-users

Reply via email to