On Wed, Sep 13, 2017 at 7:34 PM, Martin Jambor <mjam...@suse.cz> wrote:
> Hello,
>
> I apologize for not coming back to this, I keep on getting distracted.
> Anyway...
>
> On Tue, Aug 15, 2017 at 02:20:55PM +0000, Joseph Myers wrote:
>> On Tue, 15 Aug 2017, Martin Jambor wrote:
>>
>> > I am not sure what to do about this, to me it seems that the
>> > -ffp-int-builtin-inexact simply has a wrong default value, at least
>> > for x86_64, as it was added in order not to slow code down but does
>> > exactly that (all of the slowdown of course disappears when
>> > -fno-fp-int-builtin-inexact is used).
>> >
>> > Or is the situation somehow more complex?
>>
>> It's supposed to be that -ffp-int-builtin-inexact allows inexact to be
>> raised, and is on by default, and -fno-fp-int-builtin-inexact is the
>> nondefault option that disallows it from being raised and may result in
>> slower code generation.
>>
>> As I understand it, your issue is actually with inline SSE expansions of
>> certain functions.  Before my patch, those had !flag_trapping_math
>> conditionals.  My patch changed that to the logically correct
>> (TARGET_ROUND || !flag_trapping_math || flag_fp_int_builtin_inexact), that
>> being the conditions under which the expansion in question is correct.
>> Your problem is that the expansion, though correct under those conditions,
>> is slow compared to an IFUNC implementation of the library function.
>
> ...that is exactly right (modulo the fact that TARGET_ROUND meanwhile
> became TARGET_SSE4_1.
>
>>
>> Maybe that means that expansion should be disabled under some conditions
>> where it is correct but suboptimal.  It should be kept for TARGET_ROUND,
>> because then it's expanding to a single instruction.  But for
>> !TARGET_ROUND, it's a tuning question (e.g. if tuning for a processor that
>> would satisfy TARGET_ROUND, or for -mtune=generic, and building with
>> recent-enough glibc, the expansion should be avoided as suboptimal, on the
>> expectation that at runtime an IFUNC is likely to be available - or given
>> the size of the generic SSE expansion, maybe it should be avoided more
>> generally than that).
>
> This seems to me the best solution.  SSE 4.1 is 11 years old, we
> should be tuning for it in generic tuning.  That is also the reason
> why I do not think run-time checks for SSE 4.1 or an attempt at an
> internal IFUNC are a good idea (or justified effort).

Well, it's of course the poor-mans solution compared to providing our own
ifunc-enabled libm ...

I would expect that for SSE 4.1 the PLT and call overhead is measurable
and an inline run-time check be quite a bit more efficient.  As you have a
testcase would it be possible to measure that by hand-editing the assembly
(or the benchmark source in case it is not fortran...)?

The whole point of having the inline expansions was to have inline expansions,
avoding the need to spill the whole set of SSE regs around such calls.

> I was just surprised by the glibc check, what would you consider a
> recent-enough glibc?  Or is the check mainly necessary to ensure we
> are indeed using glibc and not some other libc (and thus something
> like we do for TARGET_LIBC_PROVIDES_SSP would do)?
>
> I will try to come up with a patch.

I don't think this is the appropriate solution.  Try disabling the inline
expansion and run SPEC (without -march=sse4.1 of course).

I realize that doing the inline-expansion with a runtime check
is going to be quite tricky and the GCC local IFUNC trick doesn't
solve the inlining (but we might be able to avoid spilling with some
IPA RA help and/or attributes?).

Richard.

> Thanks,
>
> Martin

Reply via email to