On Thu, 9 Feb 2017, Janne Grunau wrote:
>On 2017-02-05 14:05:49 +0200, Martin Storsjö wrote:
>>On Sun, 5 Feb 2017, Janne Grunau wrote:
>>
>>>> // out1 = in1 + in2
>>>> // out2 = in1 - in2
>>>> .macro butterfly_8h out1, out2, in1, in2
>>>>@@ -463,7 +510,7 @@ function idct16x16_dc_add_neon
>>>> ret
>>>> endfunc
>>>>
>>>>-function idct16
>>>>+.macro idct16_full
>>>> dmbutterfly0 v16, v24, v16, v24, v2, v3, v4, v5, v6, v7 // v16
= t0a, v24 = t1a
>>>> dmbutterfly v20, v28, v0.h[1], v0.h[2], v2, v3, v4, v5 // v20
= t2a, v28 = t3a
>>>> dmbutterfly v18, v30, v0.h[3], v0.h[4], v2, v3, v4, v5 // v18
= t4a, v30 = t7a
>>>>@@ -485,7 +532,10 @@ function idct16
>>>> dmbutterfly0 v22, v26, v22, v26, v2, v3, v18, v19, v30, v31
// v22 = t6a, v26 = t5a
>>>> dmbutterfly v23, v25, v0.h[1], v0.h[2], v18, v19, v30, v31
// v23 = t9a, v25 = t14a
>>>> dmbutterfly v27, v21, v0.h[1], v0.h[2], v18, v19, v30, v31,
neg=1 // v27 = t13a, v21 = t10a
>>>>+ idct16_end
>>>
>>>I think it would be clearer if idct16_end is used directly from the macro.
>>>it would probably also make sense to move idct16_end and avoid the
>>>idct16_full macro. The patch might be smaller and it is immediately
>>>obvious that there is no code change but the resulting code is more
>>>comlicated than it needs to be. same applies to arm if we go with
>>>alternative 1.
>>
>>Ok, so you mean like this?
>>
>>function idct16
>> dmbutterfly...
>> ....
>> idct16_end
>>endfunc
>
>that would be one option, the other would be to move the idct_end
>instructions as a macro out of the the existing idct16 function and use it
>as macro. That would make the full idct structural identical to the half
>and quarter version and avoid a macro only used once.
I'm not really following what you're suggesting here - can you outline it
with a code sample like mine above?