On 2017-02-05 14:05:49 +0200, Martin Storsjö wrote: > On Sun, 5 Feb 2017, Janne Grunau wrote: > > >> // out1 = in1 + in2 > >> // out2 = in1 - in2 > >> .macro butterfly_8h out1, out2, in1, in2 > >>@@ -463,7 +510,7 @@ function idct16x16_dc_add_neon > >> ret > >> endfunc > >> > >>-function idct16 > >>+.macro idct16_full > >> dmbutterfly0 v16, v24, v16, v24, v2, v3, v4, v5, v6, v7 // v16 > >> = t0a, v24 = t1a > >> dmbutterfly v20, v28, v0.h[1], v0.h[2], v2, v3, v4, v5 // v20 > >> = t2a, v28 = t3a > >> dmbutterfly v18, v30, v0.h[3], v0.h[4], v2, v3, v4, v5 // v18 > >> = t4a, v30 = t7a > >>@@ -485,7 +532,10 @@ function idct16 > >> dmbutterfly0 v22, v26, v22, v26, v2, v3, v18, v19, v30, v31 > >> // v22 = t6a, v26 = t5a > >> dmbutterfly v23, v25, v0.h[1], v0.h[2], v18, v19, v30, v31 > >> // v23 = t9a, v25 = t14a > >> dmbutterfly v27, v21, v0.h[1], v0.h[2], v18, v19, v30, v31, > >> neg=1 // v27 = t13a, v21 = t10a > >>+ idct16_end > > > >I think it would be clearer if idct16_end is used directly from the macro. > >it would probably also make sense to move idct16_end and avoid the > >idct16_full macro. The patch might be smaller and it is immediately > >obvious that there is no code change but the resulting code is more > >comlicated than it needs to be. same applies to arm if we go with > >alternative 1. > > Ok, so you mean like this? > > function idct16 > dmbutterfly... > .... > idct16_end > endfunc
that would be one option, the other would be to move the idct_end instructions as a macro out of the the existing idct16 function and use it as macro. That would make the full idct structural identical to the half and quarter version and avoid a macro only used once. Janne _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
