idct32 when possible (alternative 1)

Martin Storsjö Wed, 08 Feb 2017 23:51:13 -0800

On Thu, 9 Feb 2017, Janne Grunau wrote:

On 2017-02-05 14:05:49 +0200, Martin Storsjö wrote:

On Sun, 5 Feb 2017, Janne Grunau wrote:


>> // out1 = in1 + in2
>> // out2 = in1 - in2
>> .macro butterfly_8h out1, out2, in1, in2
>>@@ -463,7 +510,7 @@ function idct16x16_dc_add_neon
>>         ret
>> endfunc
>>
>>-function idct16
>>+.macro idct16_full
>>         dmbutterfly0    v16, v24, v16, v24, v2, v3, v4, v5, v6, v7 // v16 = 
t0a,  v24 = t1a
>>         dmbutterfly     v20, v28, v0.h[1], v0.h[2], v2, v3, v4, v5 // v20 = 
t2a,  v28 = t3a
>>         dmbutterfly     v18, v30, v0.h[3], v0.h[4], v2, v3, v4, v5 // v18 = 
t4a,  v30 = t7a
>>@@ -485,7 +532,10 @@ function idct16
>>         dmbutterfly0    v22, v26, v22, v26, v2, v3, v18, v19, v30, v31       
 // v22 = t6a,  v26 = t5a
>>         dmbutterfly     v23, v25, v0.h[1], v0.h[2], v18, v19, v30, v31       
 // v23 = t9a,  v25 = t14a
>>         dmbutterfly     v27, v21, v0.h[1], v0.h[2], v18, v19, v30, v31, 
neg=1 // v27 = t13a, v21 = t10a
>>+        idct16_end
>
>I think it would be clearer if idct16_end is used directly from the macro.
>it would probably also make sense to move idct16_end and avoid the
>idct16_full macro. The patch might be smaller and it is immediately
>obvious that there is no code change but the resulting code is more
>comlicated than it needs to be. same applies to arm if we go with
>alternative 1.

Ok, so you mean like this?

function idct16
        dmbutterfly...
        ....
        idct16_end
endfunc

that would be one option, the other would be to move the idct_endinstructions as a macro out of the the existing idct16 function and useit as macro. That would make the full idct structural identical to thehalf and quarter version and avoid a macro only used once.

I'm not really following what you're suggesting here - can you outline itwith a code sample like mine above?


// Martin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 5/5] aarch64: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible (alternative 1)

Reply via email to