On Wed, Aug 16, 2017 at 12:55 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Wed, Aug 16, 2017 at 12:51 PM, Uros Bizjak <ubiz...@gmail.com> wrote: >> On Wed, Aug 16, 2017 at 12:48 PM, Uros Bizjak <ubiz...@gmail.com> wrote: >>> On Wed, Aug 16, 2017 at 12:43 PM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Tue, Aug 15, 2017 at 9:21 PM, Uros Bizjak <ubiz...@gmail.com> wrote: >>>>> On Tue, Aug 15, 2017 at 4:59 PM, Richard Biener >>>>> <richard.guent...@gmail.com> wrote: >>>>> >>>>>> So I'd try the "easy" way of expanding if (__builtin_cpu_supports >>>>>> ("sse4.1")) >>>>>> as the sse4.1 sequence is just a single instruction. The interesting >>>>>> part >>>>>> of the story will be to make sure we can emit that even if ! >>>>>> TARGET_ROUND ... >>>>>> >>>>>> Uros, any idea how to accomplish this? Or is the idea of a "local" ifunc >>>>>> better? Note the ABI boundary will be expensive but I guess the >>>>>> conditional >>>>>> sequence as well (and it will disturb RA even if predicted to have SSE >>>>>> 4.1). >>>>> >>>>> TARGET_ROUND is just: >>>>> >>>>> /* SSE4.1 defines round instructions */ >>>>> #define OPTION_MASK_ISA_ROUND OPTION_MASK_ISA_SSE4_1 >>>>> #define TARGET_ISA_ROUND ((ix86_isa_flags & OPTION_MASK_ISA_ROUND) >>>>> != 0) >>>>> >>>>> I don't remember the history around the #define, once upon a time >>>>> probably made sense, but nowadays it looks that it can be simply >>>>> substituted with TARGET_SSE4_1. >>>> >>>> Sure but we want the backend to use a TARGET_ROUND guarded define_insn >>>> when TARGET_ROUND is false but inside a runtime conditional ensuring that >>>> TARGET_ROUND is satisfied. With doing this with ifuncs we'd mark the >>>> function >>>> with a proper target attribute but within a function? >>> >>> How about something intrinsic headers are using? >> >> (... somehow managed to press send too early ...) >> >> There we use GCC_push_options and GCC_target pragmas. Maybe we also >> need corresponding __ROUND__ define defined by the compiler. > > Those don't work inside a function. Remember I want to change the expander > of ceil () to > > if (__builtin_cpu_supports ("sse4.1")) > ceil_for_sse4.1 (); > else > ceil (); > > from the x86 target code that expands ceil for ! TARGET_ROUND. I suppose > we could simply use a separate pattern for SSE 4.1 roundsd here (does it > have to be an unspec? I suppose so to prevent it from being generated by > other means and to prevent code motion out of the conditional?) > > Or forgo with the idea to use inline conditional code and emit an ifunc > dispatcher, a function with the sse4.1 instruction, and a call to the > dispatcher > ourselves.
Hm ... Maybe in this case an example from libatomic, how cmpxchg16 is handled comes handy. Uros.