On Sun, 13 Jul 2014 10:10:00 +0100, Martin Storsjö <[email protected]> wrote:
On Fri, 11 Jul 2014, Ben Avison wrote:
I tried this code on most of our odd configurations, and ran into a
bunch of issues, most of which I've been able to resolve in one way or
another, but some parts of it is a bit ugly...
Thanks Martin. It might seem like it sometimes, but I don't deliberately
set out to break gas-preprocessor with every new patch series :)
It's a shame we have to worry about getting this code to assemble
correctly in Thumb mode, as it seems rather unlikely that it will ever
actually be used that way - most CPUs either don't support enough Thumb
instructions for it to assemble, or have NEON that will be used in
preference, or don't support short vector VFP mode, which the code
requires.
Nevertheless, I have made the changes you suggest - probably the same as
the ones you made locally. I'll repost this patch shortly.
+A .word ff_fft16_vfp @ this one alone is exported
This reference needs to be wrapped in X()
I've only ever seen X() resolve to a no-op, so I don't get this sort of
thing flagged during my own tests. When does it make any difference? I
was never clear on whether it only applied to labels from *other*
compilation units rather than the current one (made less clear by the
fact that it never seems to be used when a label is defined), or whether
it applied just to code labels, or both code and data. The name "X"
doesn't really give much of a hint!
.align 3
Is there any particular reason why this is aligned to 8 bytes instead
of 4 - shouldn't 4 (aka .align 2) be enough for float constants? (Yes,
I know this isn't added by this patch though.)
It's because the ARM11's memory bus is actually 64 bits wide. cos1pi4 and
cos1pi8 are loaded in the same vldr instruction; if cos1pi4 is 64-bit
aligned then this takes 1 cycle, otherwise 2 cycles. This multiplies up,
so for something like FFT512, that's quite a few extra cycles.
+ ldr v5, =\costable
Should this perhaps use the movrelx macro?
Um, maybe. I must admit to not really understanding what the movrel and
movrelx macros are for and when to use which. Could I respectfully ask
for some comments in asm.S to explain this to those of us who aren't
familiar with the full range of supported ABIs?
Nevertheless, wouldn't this be more readable written just as a normal
gas macro instead of as a cpp macro? That's how the corresponding macro
is done in fft_neon.S.
I couldn't figure out how to do string concatenations in gas - it's not
my favourite assembly language and I often find string manipulation is
one of its weaknesses. Nevertheless, the trick in fft_neon.S seems to
work, so thanks for the pointer.
Ben
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel