On Sat, 4 Feb 2017, Janne Grunau wrote:

On 2017-02-03 23:44:51 +0200, Martin Storsjö wrote:
On Fri, 3 Feb 2017, Janne Grunau wrote:

>On 2016-12-01 11:26:57 +0200, Martin Storsjö wrote:
>>This work is sponsored by, and copyright, Google.
>>

>>@@ -668,13 +756,40 @@ function \txfm\()16_1d_4x16_pass1_neon
>>
>>         mov             r12, #32
>>         vmov.s16        q2, #0
>>+
>>+.ifc \txfm,idct
>>+        cmp             r3,  #10
>>+        ble             3f
>>+        cmp             r3,  #38
>>+        ble             4f
>>+.endif
>
>I'd test only for less or equal 38 here
>
>>+
>> .irp i, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31
>>         vld1.16         {d\i}, [r2,:64]
>>         vst1.16         {d4},  [r2,:64], r12
>> .endr
>>
>>         bl              \txfm\()16
>>+.ifc \txfm,idct
>>+        b               5f
>
>cmp             r3,  #10
>
>>+
>>+3:
>>+.irp i, 16, 17, 18, 19
>>+        vld1.16         {d\i}, [r2,:64]
>>+        vst1.16         {d4},  [r2,:64], r12
>>+.endr
>>+        bl              idct16_quarter
>>+        b               5f
>
>remove this
>
>>+
>>+4:
>>+.irp i, 16, 17, 18, 19, 20, 21, 22, 23
>>+        vld1.16         {d\i}, [r2,:64]
>>+        vst1.16         {d4},  [r2,:64], r12
>
>.if \i == 19
>blle idct16_half
>ble  5f
>.endif
>
>saves a little binary space not sure if it's worth it.

Hmm, that looks pretty neat.

I folded in this change into the aarch64 version (and the rshrn instead of
mov) as well, using a b.gt instead of conditional bl, like this:

.if \i == 19
        b.gt            4f
        bl              idct16_quarter
        b               5f
4:
.endif

In principle I guess one could interleave the same in the full loop as well,
having only one loop, with special case checks for i == 19 and i == 23. Then
we'd end up with two comparisons instead of one when doing the full case -
not sure if it's preferrable or not.

I doubt the comparisons are noticeable. so folding it into the main loop should be fine.

Hmm, indeed. And in this case, the diff of this alternative turns out pretty small and neat actually.

// Martin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to