On Wed, Aug 16, 2017 at 12:55 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Wed, Aug 16, 2017 at 12:51 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
>> On Wed, Aug 16, 2017 at 12:48 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
>>> On Wed, Aug 16, 2017 at 12:43 PM, Richard Biener
>>> <richard.guent...@gmail.com> wrote:
>>>> On Tue, Aug 15, 2017 at 9:21 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
>>>>> On Tue, Aug 15, 2017 at 4:59 PM, Richard Biener
>>>>> <richard.guent...@gmail.com> wrote:
>>>>>
>>>>>> So I'd try the "easy" way of expanding if (__builtin_cpu_supports 
>>>>>> ("sse4.1"))
>>>>>> as the sse4.1 sequence is just a single instruction.  The interesting 
>>>>>> part
>>>>>> of the story will be to make sure we can emit that even if ! 
>>>>>> TARGET_ROUND ...
>>>>>>
>>>>>> Uros, any idea how to accomplish this?  Or is the idea of a "local" ifunc
>>>>>> better?  Note the ABI boundary will be expensive but I guess the 
>>>>>> conditional
>>>>>> sequence as well (and it will disturb RA even if predicted to have SSE 
>>>>>> 4.1).
>>>>>
>>>>> TARGET_ROUND is just:
>>>>>
>>>>> /* SSE4.1 defines round instructions */
>>>>> #define    OPTION_MASK_ISA_ROUND    OPTION_MASK_ISA_SSE4_1
>>>>> #define    TARGET_ISA_ROUND    ((ix86_isa_flags & OPTION_MASK_ISA_ROUND) 
>>>>> != 0)
>>>>>
>>>>> I don't remember the history around the #define, once upon a time
>>>>> probably made sense, but nowadays it looks that it can be simply
>>>>> substituted with TARGET_SSE4_1.
>>>>
>>>> Sure but we want the backend to use a TARGET_ROUND guarded define_insn
>>>> when TARGET_ROUND is false but inside a runtime conditional ensuring that
>>>> TARGET_ROUND is satisfied.  With doing this with ifuncs we'd mark the 
>>>> function
>>>> with a proper target attribute but within a function?
>>>
>>> How about something intrinsic headers are using?
>>
>> (... somehow managed to press send too early ...)
>>
>> There we use GCC_push_options and GCC_target pragmas. Maybe we also
>> need corresponding __ROUND__ define defined by the compiler.
>
> Those don't work inside a function.  Remember I want to change the expander
> of ceil () to
>
>  if (__builtin_cpu_supports ("sse4.1"))
>    ceil_for_sse4.1 ();
>  else
>    ceil ();
>
> from the x86 target code that expands ceil for ! TARGET_ROUND.  I suppose
> we could simply use a separate pattern for SSE 4.1 roundsd here (does it
> have to be an unspec?  I suppose so to prevent it from being generated by
> other means and to prevent code motion out of the conditional?)
>
> Or forgo with the idea to use inline conditional code and emit an ifunc
> dispatcher, a function with the sse4.1 instruction, and a call to the 
> dispatcher
> ourselves.

Hm ...

Maybe in this case an example from libatomic, how cmpxchg16 is handled
comes handy.

Uros.

Reply via email to