> > Switching to the hard-float ABI certainly does give some benefit. While > > 20% isn't a trivial difference, it's important to keep this in context. > > This is on top of what I'd guess is a 10x (i.e. 1000%) speedup achieved > > without breaking the ABI and requiring a whole new port. > > How do you figure a 10x speedup?
A fairly conservative guess at the cost of software floating point. Even a dog-slow FPU like on the Cortex-A8 should be at an order of magnitude faster than software. > > about performance then a NEON optimized version of your critical code > > should get you annother 4x or so on a Cortex-A8. > > Yes it's about 4x mathematically but 2x in practice because of the ABI > fudging. Theoretical peak gain is way more than 4x. VFP on the A8 has a peak single precision performance of about 0.1 FLOP/cycle, maybe 0.2 if you enable runfast mode. NEON peak performance is 4 FLOP/cycle. I've seen 2-3x speedup on plain scalar code without even attempting vectorization, so 4x seems fairly realistic given a bit of effort. > >> What would not be so great is that even if it was fixed, the option to > >> use a faster floating point ABI drags in a clone of > >> every package on your system (at the very least, libc, libm, and all > >> the system library dependencies) increasing the > >> size of the installed system. > > > > What you're describing here is multiarch. > > Yes, which is needed anyway to support NEON where it's available. A new port (or arch) is only required if you break the ABI. Enabling NEON has no effect on the ABI. Paul -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected] Archive: http://lists.debian.org/[email protected]

