2012/11/30 Loren Merritt <[email protected]>:
> cpu is more relevant than os.
Will amend commit message, but then I may as well put in each commit both, then.
>> +; r0q=Y r1q=s_m r2q=q_filt r3q=noise r4q=max_m
>> +cglobal hf_apply_noise_main
>
> You can invoke DEFINE_ARGS even if not generating a prologue.
I didn't know about DEFINE_ARGS, will use.
>> + movh m3, [r1q + r4q]
>> + movh m4, [r1q + r4q + 8]
>
> Can these be a single aligned load?
Yes, but then I'm probably missing a trick here, because altering the
above and following code like that:
movu m3, [s_mq + max_mq]
mova m4, m3
unpcklps m3, m3
unpckhps m4, m4
is slower. (movhlps/unpcklps is even slower)
Is there a way to do that in 3 insns then?
>> + cmpps m6, m5, 0 ; m1 == 0
>> + cmpps m7, m5, 0 ; m1 == 0
>
> You mean m7 == 0?
Will correct, remnant of the code from before unrolling.
>> +cglobal sbr_hf_apply_noise_0, 4,5,8, Y,s_m,q_filt,noise,kx,m_max
>> + mova m0, [ps_noise0]
>> + mov r4d, m_maxm
>> + call hf_apply_noise_main
>> + RET
>
> TAIL_CALL hf_apply_noise_main, 1
Which makes me think that every caller should have the same epilog
(same stack offset etc). Is there a way I just do a jmp here and let
the "jumpee" do the epilog.
Another thing I'm wondering (can't make sure for the next 4 days):
mov r4d, m_maxm
If I'm not mistaken, m_max should already be in r5 for linux
x86_64/amd64 ABI (whatever I should call it).
So I could save that mov and have instead hf_apply_noise_main use r5
under that condition.
Does that make sense?
--
Christophe
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel