>
> This seems to be inconsistant... ModifyCoord_Dist_PTLens_SSE() falls
> back to non SSE code for unaligned buffers. However
> ModifyCoord_UnDist_PTLens_SSE() does not check for alignment but uses
> slow unaligned loads all the time.
> From my point of view your solution in mod-color-sse2.cpp is nice for
> this case as it removes duplicate code and allows the use of SSE for
> aligned and unaligned memory.
> On the other hand I am not sure if we need SSE code for unaligned
> buffers. AFAIK most recent compilers on 64bit systems align to at least
> 16 bytes by default and most applications use SSE and therefore
> internally already use aligned buffers. I assume that we can expect that
> if a system has support for SSE (all 64bit processors) we get properly
> aligned memory. Of course we need to check and fall back to non SSE code
> if that is not the case.
> So adding all the lambdas and different versions might only be relevant
> for a really really low percentage of users and not worth the effort.
> What do you think?
All of the assumptions looks sane to me.
It is basically the question of code quality vs. some speed increase for
using SSE code for unaligned buffers for some amount of users.
If we decide that former is more important, :
- No code duplication.
- No *C++11 Lambdas* are needed to avoid code duplication.
- We can also get rid of *_Select_* helper functions.
- Probably no bump to C++11, not sure.
How many applications will be affected by this (speed-wise)?
I can not speak for all of them, but the needed changes are so small, so i
believe we can safely assume that it is fine.
Required changes: (example)
Old:
> void *ptr = malloc((size_t)4*W*H*sizeof(float));
New:
> void *ptr = NULL;
> if(posix_memalign(&ptr, 16, (size_t)4*W*H*sizeof(float))) return NULL;
>
So yes, *i* believe we can simplify it by removing wrappers around SSE for
unaligned buffers,
adding checking for proper alignment and falling back to non SSE code.
However final decision is yours to make.
I will change my (and all other LF code, if needed) in the needed way,
depending on the final decision.
Roman.
On Fri, Oct 10, 2014 at 11:35 PM, Sebastian Kraft <[email protected]>
wrote:
> Am 10.10.2014 um 13:42 schrieb Roman Lebedev:
> >>
> >> - There was two identical *ModifyColor_DeVignetting_PA_SSE2* functions,
> >>> that only were different in loading/storing part (alignment issues), i
> >>> was able to collapse them to 1 function with help of *C++11 Lambdas*.
> >>
> >> Perfect!
> >
> > Now that i have looked at mod-coord-sse.cpp
> > <
> https://github.com/LebedevRI/lensfun/blob/master/libs/lensfun/mod-coord-sse.cpp
> >,
> > and it only works on properly aligned buffers, otherwise falling-back to
> > plain non-SSE code.
>
> This seems to be inconsistant... ModifyCoord_Dist_PTLens_SSE() falls
> back to non SSE code for unaligned buffers. However
> ModifyCoord_UnDist_PTLens_SSE() does not check for alignment but uses
> slow unaligned loads all the time.
>
> From my point of view your solution in mod-color-sse2.cpp is nice for
> this case as it removes duplicate code and allows the use of SSE for
> aligned and unaligned memory.
>
> On the other hand I am not sure if we need SSE code for unaligned
> buffers. AFAIK most recent compilers on 64bit systems align to at least
> 16 bytes by default and most applications use SSE and therefore
> internally already use aligned buffers. I assume that we can expect that
> if a system has support for SSE (all 64bit processors) we get properly
> aligned memory. Of course we need to check and fall back to non SSE code
> if that is not the case.
>
> So adding all the lambdas and different versions might only be relevant
> for a really really low percentage of users and not worth the effort.
> What do you think?
>
> Sebastian
>
>
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://p.sf.net/sfu/Zoho
_______________________________________________
Lensfun-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/lensfun-users