On Feb 25, 2013, at 20:19, Claudio Freire wrote:

> That's because __builtin_assume_aligned isn't being called (most
> likely, didn't check). That results in **far** sub-optimal
> vectorization. I don't know about the failing tests though.

I doubt that call (or rather, token?) is required on OS X, where memory 
allocations (and stack alignment) are aligned. I know of a case where the 
absence of the token didn't prevent a very substantial performance gain, but 
haven't checked if that's always the case.

I have a list of the loops that were vectorised (hard to read as I build with 
-j4 :)). I kind of expect those loops to be in places where the vectorisation 
doesn't change the overall performance picture - because the containing 
functions don't do any significant work (think initialisation) or simply aren't 
called at all (catch-all cases for which there is no hand-coded mmx/sse/... 
function that don't get tripped by the test suite). If that hunch is correct, 
the absence of a performance gain isn't surprising.

I guess I ought to repeat the comparison in x86_64 mode...


R.
_______________________________________________
Libav-user mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/libav-user

Reply via email to