Hello,

I apologize for not coming back to this, I keep on getting distracted.
Anyway...

On Tue, Aug 15, 2017 at 02:20:55PM +0000, Joseph Myers wrote:
> On Tue, 15 Aug 2017, Martin Jambor wrote:
> 
> > I am not sure what to do about this, to me it seems that the
> > -ffp-int-builtin-inexact simply has a wrong default value, at least
> > for x86_64, as it was added in order not to slow code down but does
> > exactly that (all of the slowdown of course disappears when
> > -fno-fp-int-builtin-inexact is used).
> > 
> > Or is the situation somehow more complex?
> 
> It's supposed to be that -ffp-int-builtin-inexact allows inexact to be 
> raised, and is on by default, and -fno-fp-int-builtin-inexact is the 
> nondefault option that disallows it from being raised and may result in 
> slower code generation.
> 
> As I understand it, your issue is actually with inline SSE expansions of 
> certain functions.  Before my patch, those had !flag_trapping_math 
> conditionals.  My patch changed that to the logically correct 
> (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact), that 
> being the conditions under which the expansion in question is correct.  
> Your problem is that the expansion, though correct under those conditions, 
> is slow compared to an IFUNC implementation of the library function.

...that is exactly right (modulo the fact that TARGET_ROUND meanwhile
became TARGET_SSE4_1.

> 
> Maybe that means that expansion should be disabled under some conditions 
> where it is correct but suboptimal.  It should be kept for TARGET_ROUND, 
> because then it's expanding to a single instruction.  But for 
> !TARGET_ROUND, it's a tuning question (e.g. if tuning for a processor that 
> would satisfy TARGET_ROUND, or for -mtune=generic, and building with 
> recent-enough glibc, the expansion should be avoided as suboptimal, on the 
> expectation that at runtime an IFUNC is likely to be available - or given 
> the size of the generic SSE expansion, maybe it should be avoided more 
> generally than that).

This seems to me the best solution.  SSE 4.1 is 11 years old, we
should be tuning for it in generic tuning.  That is also the reason
why I do not think run-time checks for SSE 4.1 or an attempt at an
internal IFUNC are a good idea (or justified effort).

I was just surprised by the glibc check, what would you consider a
recent-enough glibc?  Or is the check mainly necessary to ensure we
are indeed using glibc and not some other libc (and thus something
like we do for TARGET_LIBC_PROVIDES_SSP would do)?

I will try to come up with a patch.

Thanks,

Martin

Reply via email to