On Wed, Jan 23, 2013 at 5:20 PM, Ronald S. Bultje <[email protected]> wrote:
> Hi,
>
> On Wed, Jan 23, 2013 at 1:16 PM, Daniel Kang <[email protected]> wrote:
>> On Wed, Jan 23, 2013 at 4:14 PM, Daniel Kang <[email protected]> wrote:
>>> On Wed, Jan 23, 2013 at 12:36 PM, Ronald S. Bultje <[email protected]> 
>>> wrote:
>>>> Hi Daniel,
>>>>
>>>> On Tue, Jan 22, 2013 at 11:19 PM, Daniel Kang <[email protected]> 
>>>> wrote:
>>>>> @@ -1330,10 +1087,12 @@ static void OPNAME ## qpel8_mc12_ ## MMX(uint8_t 
>>>>> *dst, uint8_t *src,    \
>>>>>  {                                                                       \
>>>>>      uint64_t half[8 + 9];                                               \
>>>>>      uint8_t * const halfH = ((uint8_t*)half);                           \
>>>>> -    put ## RND ## mpeg4_qpel8_h_lowpass_ ## MMX(halfH, src, 8,          \
>>>>> -                                                stride, 9);             \
>>>>> -    put ## RND ## pixels8_l2_ ## MMX(halfH, src, halfH, 8, stride, 9);  \
>>>>> -    OPNAME ## mpeg4_qpel8_v_lowpass_ ## MMX(dst, halfH, stride, 8);     \
>>>>> +    ff_put ## RND ## mpeg4_qpel8_h_lowpass_ ## MMX(halfH, src, 8,       \
>>>>> +                                                   stride, 9);          \
>>>>> +    ff_put ## RND ## pixels8_l2_ ## MMX(halfH, src, halfH,              \
>>>>> +                                        8, stride, 9);                  \
>>>>> +    ff_ ## OPNAME ## mpeg4_qpel8_v_lowpass_ ## MMX(dst, halfH,          \
>>>>> +                                                   stride, 8);          \
>>>>>  }                                                                       \
>>>>
>>>> So, for all cases like this, does this actually affect speed? I mean,
>>>> previously this could be inlined, now it no longer can be. I wonder if
>>>> that has any effect on speed (i.e. was it ever inlined previously?).
>>>
>>> Depending on the architecture (??) the functions are inlined, but are
>>> often not. I suspect GCC's insane method of reordering registers
>>> swallows any overhead from calling these functions, but due to macro
>>> hell, I'm not sure of the best way to test this.
>>
>> Sorry, this was not very clear. I think the yasm version is faster
>> despite calling overhead, because GCC uses some ridiculous method of
>> reordering registers for the inline assembly.
>
> Do you have numbers?

Here's an example:

yasm (put_qpel16_mc21):
8285
8333
8278
8347
8273
AVG: 8303.2

inline (put_qpel16_mc21):
8505
8424
8295
8400
8461
AVG: 8417
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to