2014-03-29 2:10 GMT+01:00 James Almer <[email protected]>: > You're right that it's all float data, but both Christophe and I tested and > xorps/shufps was a bit slower than pxor/pshufd (At least in my tests it was > about five cycles slower), so i decided to use some ifdeffery to keep the > SSE2 version intact.
I can confirm this: James did what you proposed first, and I mentioned having benchmarked it as slower. Same observation from him, hence the current code. If this was always true, it would be nice to have something like xorps/... a macro switching to either instruction depending on the set. Not sure x264 would benefit from this, of course. -- Christophe _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
