On 24/01/13 22:30, Daniel Kang wrote: >>>> Depending on the architecture (??) the functions are inlined, but are >>>> often not. I suspect GCC's insane method of reordering registers >>>> swallows any overhead from calling these functions, but due to macro >>>> hell, I'm not sure of the best way to test this. >>> >>> Sorry, this was not very clear. I think the yasm version is faster >>> despite calling overhead, because GCC uses some ridiculous method of >>> reordering registers for the inline assembly. >> >> Do you have numbers? > > Here's an example: > > yasm (put_qpel16_mc21): > 8285 > 8333 > 8278 > 8347 > 8273 > AVG: 8303.2 > > inline (put_qpel16_mc21): > 8505 > 8424 > 8295 > 8400 > 8461 > AVG: 8417
While monitoring patches we got similar results we definitely over-inline and/or gcc is to optimistic about data-cache. http://blog.flameeyes.eu/2013/01/postmortem-of-a-patch-or-how-do-you-find-what-changed For some information about tools in use, probably it could go into the developer documentation sooner or later. (yes, from time to time I do benchmark stuff I consider interesting) lu _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
