On Feb 25, 2013, at 20:19, Claudio Freire wrote: > That's because __builtin_assume_aligned isn't being called (most > likely, didn't check). That results in **far** sub-optimal > vectorization. I don't know about the failing tests though.
I doubt that call (or rather, token?) is required on OS X, where memory allocations (and stack alignment) are aligned. I know of a case where the absence of the token didn't prevent a very substantial performance gain, but haven't checked if that's always the case. I have a list of the loops that were vectorised (hard to read as I build with -j4 :)). I kind of expect those loops to be in places where the vectorisation doesn't change the overall performance picture - because the containing functions don't do any significant work (think initialisation) or simply aren't called at all (catch-all cases for which there is no hand-coded mmx/sse/... function that don't get tripped by the test suite). If that hunch is correct, the absence of a performance gain isn't surprising. I guess I ought to repeat the comparison in x86_64 mode... R. _______________________________________________ Libav-user mailing list [email protected] http://ffmpeg.org/mailman/listinfo/libav-user
