Hi,
On Sun, Mar 4, 2012 at 9:24 AM, Christophe Gisquet
<[email protected]> wrote:
> 2012/2/28 Jason Garrett-Glaser <[email protected]>:
>> The shuffling shouldn't be that bad. Add, shuffle, and multiply
>> typically use three separate execution units, and you have roughly
>> equal amounts of each.
>
> Ronald, I think you mentioned wanting to have a look at this?
[..]
> +cglobal sbr_hf_gen, 4,4,8, X_high, X_low, alpha0, alpha1
> +
> + ; Set pointers
> +%if ARCH_X86_64 == 0
> + ; start and end 6th and 7th args on stack
> + mov r2, [rsp + 28]
> + mov r3, [rsp + 32]
> +%xdefine start r2
> +%xdefine end r3
> +%else
> +
> +%if WIN64
%elif WIN64
> + ; 2 last args are first on stack
> + ; xmm6 and xmm7 saved on the stack - account for offset
> + mov r2, [rsp + 96]
> + mov r3, [rsp + 104]
> +%xdefine start r2
> +%xdefine end r3
> +%else
> + ; 6 args in 6 regs
> +%xdefine start r8
> +%xdefine end r9
> +%endif
> +
> +%endif
[..]
> +%define bw m0
> +%if ARCH_X86_64 == 0
> + movss bw, [rsp + (5+1)*4]
> +%else
> + ; First float in xmm0 for x86_64 abis except win64, thus:
> + ; bw already loaded in xmm0 except for win64 where still on stack
> +%if WIN64
> + movss bw, [rsp + 88]
> +%endif
> +%endif
This does sort of suck... I wonder if you can make this slightly more
understandable like this:
%if ARCH_X86_32 || WIN64
cglobal func, 4, 7, 8, X_high, X_low, alpha0, alpha1, bw, start, end
mov startq, startm
mov endq, endm
mov m0, bwm
%else ; UNIX64
cglobal func, 6, 6, 8, X_high, X_low, alpha0, alpha1, start, end
%endif
Unrelated, your code above won't work on x86-32/win64 anyway, since
you're writing start/end into r2/r3 before loading alpha0/alpha1
below:
> + ; load alpha factors
[..]
> + movq m2, [alpha1q]
> + movq m1, [alpha0q]
Loop itself looks OK to me, sorry for the lame review. ;-).
Ronald
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel