On 2017-02-03 23:44:51 +0200, Martin Storsjö wrote:
> On Fri, 3 Feb 2017, Janne Grunau wrote:
> 
> >On 2016-12-01 11:26:57 +0200, Martin Storsjö wrote:
> >>This work is sponsored by, and copyright, Google.
> >>
> 
> >>@@ -668,13 +756,40 @@ function \txfm\()16_1d_4x16_pass1_neon
> >>
> >>         mov             r12, #32
> >>         vmov.s16        q2, #0
> >>+
> >>+.ifc \txfm,idct
> >>+        cmp             r3,  #10
> >>+        ble             3f
> >>+        cmp             r3,  #38
> >>+        ble             4f
> >>+.endif
> >
> >I'd test only for less or equal 38 here
> >
> >>+
> >> .irp i, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31
> >>         vld1.16         {d\i}, [r2,:64]
> >>         vst1.16         {d4},  [r2,:64], r12
> >> .endr
> >>
> >>         bl              \txfm\()16
> >>+.ifc \txfm,idct
> >>+        b               5f
> >
> >cmp             r3,  #10
> >
> >>+
> >>+3:
> >>+.irp i, 16, 17, 18, 19
> >>+        vld1.16         {d\i}, [r2,:64]
> >>+        vst1.16         {d4},  [r2,:64], r12
> >>+.endr
> >>+        bl              idct16_quarter
> >>+        b               5f
> >
> >remove this
> >
> >>+
> >>+4:
> >>+.irp i, 16, 17, 18, 19, 20, 21, 22, 23
> >>+        vld1.16         {d\i}, [r2,:64]
> >>+        vst1.16         {d4},  [r2,:64], r12
> >
> >.if \i == 19
> >blle idct16_half
> >ble  5f
> >.endif
> >
> >saves a little binary space not sure if it's worth it.
> 
> Hmm, that looks pretty neat.
> 
> I folded in this change into the aarch64 version (and the rshrn instead of
> mov) as well, using a b.gt instead of conditional bl, like this:
> 
> .if \i == 19
>         b.gt            4f
>         bl              idct16_quarter
>         b               5f
> 4:
> .endif
> 
> In principle I guess one could interleave the same in the full loop as well,
> having only one loop, with special case checks for i == 19 and i == 23. Then
> we'd end up with two comparisons instead of one when doing the full case -
> not sure if it's preferrable or not.

I doubt the comparisons are noticeable. so folding it into the main loop 
should be fine.

> The main question though is whether you prefer this or alternative 2.

see my other mail. I have no strong opinion.

Janne
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to