On 2017-02-03 23:44:51 +0200, Martin Storsjö wrote:
> On Fri, 3 Feb 2017, Janne Grunau wrote:
>
> >On 2016-12-01 11:26:57 +0200, Martin Storsjö wrote:
> >>This work is sponsored by, and copyright, Google.
> >>
>
> >>@@ -668,13 +756,40 @@ function \txfm\()16_1d_4x16_pass1_neon
> >>
> >> mov r12, #32
> >> vmov.s16 q2, #0
> >>+
> >>+.ifc \txfm,idct
> >>+ cmp r3, #10
> >>+ ble 3f
> >>+ cmp r3, #38
> >>+ ble 4f
> >>+.endif
> >
> >I'd test only for less or equal 38 here
> >
> >>+
> >> .irp i, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31
> >> vld1.16 {d\i}, [r2,:64]
> >> vst1.16 {d4}, [r2,:64], r12
> >> .endr
> >>
> >> bl \txfm\()16
> >>+.ifc \txfm,idct
> >>+ b 5f
> >
> >cmp r3, #10
> >
> >>+
> >>+3:
> >>+.irp i, 16, 17, 18, 19
> >>+ vld1.16 {d\i}, [r2,:64]
> >>+ vst1.16 {d4}, [r2,:64], r12
> >>+.endr
> >>+ bl idct16_quarter
> >>+ b 5f
> >
> >remove this
> >
> >>+
> >>+4:
> >>+.irp i, 16, 17, 18, 19, 20, 21, 22, 23
> >>+ vld1.16 {d\i}, [r2,:64]
> >>+ vst1.16 {d4}, [r2,:64], r12
> >
> >.if \i == 19
> >blle idct16_half
> >ble 5f
> >.endif
> >
> >saves a little binary space not sure if it's worth it.
>
> Hmm, that looks pretty neat.
>
> I folded in this change into the aarch64 version (and the rshrn instead of
> mov) as well, using a b.gt instead of conditional bl, like this:
>
> .if \i == 19
> b.gt 4f
> bl idct16_quarter
> b 5f
> 4:
> .endif
>
> In principle I guess one could interleave the same in the full loop as well,
> having only one loop, with special case checks for i == 19 and i == 23. Then
> we'd end up with two comparisons instead of one when doing the full case -
> not sure if it's preferrable or not.
I doubt the comparisons are noticeable. so folding it into the main loop
should be fine.
> The main question though is whether you prefer this or alternative 2.
see my other mail. I have no strong opinion.
Janne
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel