Sorry, discard my last message.
I only looked at *ModifyCoord_UnDist_PTLens_SSE*, so my impression is not
valid.
On Fri, Oct 10, 2014 at 3:42 PM, Roman Lebedev <[email protected]> wrote:
> - There was two identical *ModifyColor_DeVignetting_PA_SSE2* functions,
>>> that only were different in loading/storing part (alignment issues), i
>>> was able to collapse them to 1 function with help of *C++11 Lambdas*.
>>
>> Perfect!
>
> Now that i have looked at mod-coord-sse.cpp
> <https://github.com/LebedevRI/lensfun/blob/master/libs/lensfun/mod-coord-sse.cpp>,
> and it only works on properly aligned buffers, otherwise falling-back to
> plain non-SSE code.
> So it *might* make sense to do the same for mod-color-sse2.cpp
> <https://github.com/LebedevRI/lensfun/blob/vectorization/libs/lensfun/mod-color-sse2.cpp>
> too
> *AND properly document that!*
> But i guess that decision is up to you.
>
> On Thu, Oct 9, 2014 at 10:55 PM, Sebastian Kraft <[email protected]>
> wrote:
>
>> Hi Roman,
>>
>> thanks for your work on the SSE implementation, looks great!
>>
>> >
>> > - *C++11* is now required for vectorized code (sse/sse2).* (needed
>> for
>> > the next change)*
>> > - If compiler does not support *C++11* (e.g. gcc < 4.7),
>> *SSE*/*SSE2*
>> > hand-vectorized code will not be compiled.
>>
>> Do you know if this works with Visual Studio? There are some people
>> using VS2013 to compile lensfun.
>>
>> > - There was two identical *ModifyColor_DeVignetting_PA_SSE2*
>> functions,
>> > that only were different in loading/storing part (alignment issues),
>> i
>> > was able to collapse them to 1 function with help of *C++11 Lambdas*.
>>
>> Perfect!
>>
>> > - With CMAKE_BUILD_TYPE Debug,
>> *ModifyColor_DeVignetting_PA_Select_SSE* is
>> > *slower* than previous non-vectorized code.
>>
>> That may happen...
>>
>> > - For lf_f32 pixel type, there seems to be no difference in
>> performance
>> > between version of code using *C++11 Lambdas* and old *two-function
>> > approach*.
>> > - With CMAKE_BUILD_TYPE Debug,
>> *ModifyColor_DeVignetting_PA_Select_SSE* is
>> > *~1.4 times faster* than previous non-vectorized code.
>> >
>>
>> Looks good. Did you compare the hand optimized SSE code with normal code
>> and -O3 optimization?
>>
>> Currently I am restructuring the source tree a bit and also started to
>> implement a test suite for lensfun. I will post more information on this
>> probably at the weekend. Would be great if we could also add some
>> performance testing to see where further bottle necks are hidden in the
>> code. And we should add some tests to verify that SSE code and normal
>> code both give accurate results. Maybe you can help to write tests for
>> the vignetting part?
>>
>> Sebastian
>>
>>
>> ------------------------------------------------------------------------------
>> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
>> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
>> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
>> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Lensfun-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/lensfun-users
>>
>
>
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Lensfun-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/lensfun-users