On 2017-02-03 23:44:51 +0200, Martin Storsjö wrote: > On Fri, 3 Feb 2017, Janne Grunau wrote: > > >On 2016-12-01 11:26:57 +0200, Martin Storsjö wrote: > >>This work is sponsored by, and copyright, Google. > >> > > >>@@ -668,13 +756,40 @@ function \txfm\()16_1d_4x16_pass1_neon > >> > >> mov r12, #32 > >> vmov.s16 q2, #0 > >>+ > >>+.ifc \txfm,idct > >>+ cmp r3, #10 > >>+ ble 3f > >>+ cmp r3, #38 > >>+ ble 4f > >>+.endif > > > >I'd test only for less or equal 38 here > > > >>+ > >> .irp i, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 > >> vld1.16 {d\i}, [r2,:64] > >> vst1.16 {d4}, [r2,:64], r12 > >> .endr > >> > >> bl \txfm\()16 > >>+.ifc \txfm,idct > >>+ b 5f > > > >cmp r3, #10 > > > >>+ > >>+3: > >>+.irp i, 16, 17, 18, 19 > >>+ vld1.16 {d\i}, [r2,:64] > >>+ vst1.16 {d4}, [r2,:64], r12 > >>+.endr > >>+ bl idct16_quarter > >>+ b 5f > > > >remove this > > > >>+ > >>+4: > >>+.irp i, 16, 17, 18, 19, 20, 21, 22, 23 > >>+ vld1.16 {d\i}, [r2,:64] > >>+ vst1.16 {d4}, [r2,:64], r12 > > > >.if \i == 19 > >blle idct16_half > >ble 5f > >.endif > > > >saves a little binary space not sure if it's worth it. > > Hmm, that looks pretty neat. > > I folded in this change into the aarch64 version (and the rshrn instead of > mov) as well, using a b.gt instead of conditional bl, like this: > > .if \i == 19 > b.gt 4f > bl idct16_quarter > b 5f > 4: > .endif > > In principle I guess one could interleave the same in the full loop as well, > having only one loop, with special case checks for i == 19 and i == 23. Then > we'd end up with two comparisons instead of one when doing the full case - > not sure if it's preferrable or not.
I doubt the comparisons are noticeable. so folding it into the main loop should be fine. > The main question though is whether you prefer this or alternative 2. see my other mail. I have no strong opinion. Janne _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel