Hi, On Wed, Jan 23, 2013 at 1:16 PM, Daniel Kang <[email protected]> wrote: > On Wed, Jan 23, 2013 at 4:14 PM, Daniel Kang <[email protected]> wrote: >> On Wed, Jan 23, 2013 at 12:36 PM, Ronald S. Bultje <[email protected]> >> wrote: >>> Hi Daniel, >>> >>> On Tue, Jan 22, 2013 at 11:19 PM, Daniel Kang <[email protected]> >>> wrote: >>>> @@ -1330,10 +1087,12 @@ static void OPNAME ## qpel8_mc12_ ## MMX(uint8_t >>>> *dst, uint8_t *src, \ >>>> { \ >>>> uint64_t half[8 + 9]; \ >>>> uint8_t * const halfH = ((uint8_t*)half); \ >>>> - put ## RND ## mpeg4_qpel8_h_lowpass_ ## MMX(halfH, src, 8, \ >>>> - stride, 9); \ >>>> - put ## RND ## pixels8_l2_ ## MMX(halfH, src, halfH, 8, stride, 9); \ >>>> - OPNAME ## mpeg4_qpel8_v_lowpass_ ## MMX(dst, halfH, stride, 8); \ >>>> + ff_put ## RND ## mpeg4_qpel8_h_lowpass_ ## MMX(halfH, src, 8, \ >>>> + stride, 9); \ >>>> + ff_put ## RND ## pixels8_l2_ ## MMX(halfH, src, halfH, \ >>>> + 8, stride, 9); \ >>>> + ff_ ## OPNAME ## mpeg4_qpel8_v_lowpass_ ## MMX(dst, halfH, \ >>>> + stride, 8); \ >>>> } \ >>> >>> So, for all cases like this, does this actually affect speed? I mean, >>> previously this could be inlined, now it no longer can be. I wonder if >>> that has any effect on speed (i.e. was it ever inlined previously?). >> >> Depending on the architecture (??) the functions are inlined, but are >> often not. I suspect GCC's insane method of reordering registers >> swallows any overhead from calling these functions, but due to macro >> hell, I'm not sure of the best way to test this. > > Sorry, this was not very clear. I think the yasm version is faster > despite calling overhead, because GCC uses some ridiculous method of > reordering registers for the inline assembly.
Do you have numbers? Ronald _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
