Re: Asm volatile causing performance regressions on ARM

Richard Sandiford Thu, 27 Feb 2014 09:04:01 -0800

Yury Gribov <y.gri...@samsung.com> writes:
> Richard Biener wrote:
>>>> If this behavior is not intended, what would be the best way to fix
>>>> performance? I could teach GCC to not remove constant RTXs in
>>>> flush_hash_table() but this is probably very naive and won't cover some
>>>> corner-cases.
>>>
>>> That could be a good starting point though.
>>
>> Though with modifying "machine state" you can modify constants as well, no?
>
> Valid point but this would mean relying on compiler to always load all 
> constants from memory (instead of, say, generating them via movhi/movlo) 
> for a piece of code which looks extremely unstable.


Right.  And constant rtx codes have mode-independent semantics.
(const_int 1) is always 1, whatever a volatile asm does.  Same for
const_double, symbol_ref, label_ref, etc.  If a constant load is implemented
using some mode-dependent operation then it would need to be represented
as something like an unspec instead.  But even then, the result would
usually be annotated with a REG_EQUAL note giving the value of the final
register result.  It should be perfectly OK to reuse that register after
a volatile asm if the value in the REG_EQUAL note is needed again.

> What is the general attitude towards volatile asm? Are people interested 
> in making it more defined/performant or should we just leave this can of 
> worms as is? I can try to improve generated code but my patches will be 
> doomed if there is no consensus on what volatile asm actually means...

I think part of the problem is that some parts of GCC (like the one you
noted) are far more conservative than others.  E.g. take:

  void foo (int x, int *y)
  {
    y[0] = x + 1;
    asm volatile ("# asm");
    y[1] = x + 1;
  }

The extra-paranoid check you pointed out means that we assume that
x + 1 is no longer available after the asm for rtx-level CSE, but take
the opposite view for tree-level CSE, which happily optimises away the
second +.

Some places were (maybe still are) worried that volatile asms could
clobber any register they like.  But the register allocator assumes that
registers are preserved across volatile asms unless explicitly clobbered.
And AFAIK it always has.  So in the above example we get:

        addl    $1, %edi
        movl    %edi, (%rsi)
#APP
# 4 "/tmp/foo.c" 1
        # asm
# 0 "" 2
#NO_APP
        movl    %edi, 4(%rsi)
        ret

with %edi being live across the asm.

We do nothing this draconian for a normal function call, which could
easily use a volatile asm internally.  IMO anything that isn't flushed
for a call shouldn't be flushed for a volatile asm either.

One of the big grey areas is what should happen for floating-point ops
that depend on the current rounding mode.  That isn't really modelled
properly yet though.  Again, it affects calls as well as volatile asms.

Thanks,
Richard

Re: Asm volatile causing performance regressions on ARM

Reply via email to