Hi, Segher
> > > > In pr84524.c we got a loop with an extended inline asm:
> > > > asm volatile ("" : "+r" (v))
> > > > which also gives us a “surprising” situation Alexander predicts.
> > > >
> > > > For sched-deps scanner such volatile asm is a “true barrier” and we
> > > > create dependencies to almost all other instructions, while DF scanners
> > > > don’t give us such information.
> > >
> > > There is no such thing as a "true barrier" in extended asm. The only
> > > thing
> > > volatile asm means is that the asm has a side effect unknown to the
> > > compiler;
> > > this can *not* be a modification of a register or of memory contents, such
> > > things are known by the compiler, that's what clobbers and "memory"
> > > clobber
> > > are about.
> >
> > In sched-deps.c we got:
> > case ASM_OPERANDS:
> > case ASM_INPUT:
> > {
> > /* Traditional and volatile asm instructions must be considered to
> > use
> > and clobber all hard registers, all pseudo-registers and all of
> > memory. So must TRAP_IF and UNSPEC_VOLATILE operations.
> >
> > Consider for instance a volatile asm that changes the fpu rounding
> > mode. An insn should not be moved across this even if it only
> > uses
> > pseudo-regs because it might give an incorrectly rounded result.
> > */
> > if ((code != ASM_OPERANDS || MEM_VOLATILE_P (x))
> > && !DEBUG_INSN_P (insn))
> > reg_pending_barrier = TRUE_BARRIER;
> > ...
>
> This code was added in 1997 (r14770). In 2004 the documentation was
> changed to clarify how things really work (r88999):
>
> "Note that even a volatile @code{asm} instruction can be moved relative to
> other code, including across jump instructions."
>
> (followed by an example exactly about what this means for FPU control).
Thanks for pointing to that changes! Unfortunately, sched-deps.c was
more conservative this 15 years...
Let’s try to fix it.
> > > > Maybe it is a good idea somehow re-implement modulo scheduler using only
> > > > one scanner instead of two, but at the moment the best thing to do is to
> > > > detect the situation earlier and skip such loops.
> > >
> > > Or fix the broken code...
> >
> > I’m not sure what you mean here,
>
> I mean have the modulo scheduler implement the correct asm semantics, not
> some more restrictive thing that gets it into conflicts with DF, etc.
>
> I don't think this will turn out to be a problem in any way. Some invalid
> asm will break, sure.
I have started with applying this without any SMS changes:
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -2753,22 +2753,14 @@ sched_analyze_2 (struct deps_desc *deps, rtx
x, rtx_insn *insn)
case UNSPEC_VOLATILE:
flush_pending_lists (deps, insn, true, true);
+
+ if (!DEBUG_INSN_P (insn))
+ reg_pending_barrier = TRUE_BARRIER;
/* FALLTHRU */
case ASM_OPERANDS:
case ASM_INPUT:
{
- /* Traditional and volatile asm instructions must be considered to use
- and clobber all hard registers, all pseudo-registers and all of
- memory. So must TRAP_IF and UNSPEC_VOLATILE operations.
-
- Consider for instance a volatile asm that changes the fpu rounding
- mode. An insn should not be moved across this even if it only uses
- pseudo-regs because it might give an incorrectly rounded result. */
- if ((code != ASM_OPERANDS || MEM_VOLATILE_P (x))
- && !DEBUG_INSN_P (insn))
- reg_pending_barrier = TRUE_BARRIER;
-
/* For all ASM_OPERANDS, we must traverse the vector of input operands.
We cannot just fall through here since then we would be confused
by the ASM_INPUT rtx inside ASM_OPERANDS, which do not indicate
Regstrapping it on x86-64 shows some failures. First is with ms-sysv
abi test and can be solved like this:
diff --git a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc
b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc
--- a/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc
+++ b/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/gen.cc
@@ -511,7 +511,7 @@ void make_do_test (const vector<class arg> &args,
/* End if init_test call. */
if (f.get_realign () && unaligned == 1)
- out << " __asm__ __volatile__ (\"subq $8,%%rsp\":::\"cc\");"
+ out << " __asm__ __volatile__ (\"subq
$8,%%rsp\":::\"cc\", \"memory\");"
<< endl;
out << " ret = do_test_"
@@ -525,7 +525,7 @@ void make_do_test (const vector<class arg> &args,
out << ");" << endl;
if (f.get_realign () && unaligned == 1)
- out << " __asm__ __volatile__ (\"addq $8,%%rsp\":::\"cc\");"
+ out << " __asm__ __volatile__ (\"addq
$8,%%rsp\":::\"cc\", \"memory\");"
<< endl;
out << " check_results (ret);" << endl;
Here we have some asms which control stack pointer (sigh!). It
certainly may be broken at any moment, but right now “memory” clobber
fixes everything for me.
Another failure is:
FAIL: gcc.dg/guality/pr58791-4.c -O2 -DPREVENT_OPTIMIZATION line
pr58791-4.c:32 i == 486
FAIL: gcc.dg/guality/pr58791-4.c -O2 -DPREVENT_OPTIMIZATION line
pr58791-4.c:32 i2 == 487
FAIL: gcc.dg/guality/pr58791-4.c -O3 -g -DPREVENT_OPTIMIZATION
line pr58791-4.c:32 i == 486
FAIL: gcc.dg/guality/pr58791-4.c -O3 -g -DPREVENT_OPTIMIZATION
line pr58791-4.c:32 i2 == 487
FAIL: gcc.dg/guality/pr58791-4.c -Os -DPREVENT_OPTIMIZATION line
pr58791-4.c:32 i == 486
FAIL: gcc.dg/guality/pr58791-4.c -Os -DPREVENT_OPTIMIZATION line
pr58791-4.c:32 i2 == 487
FAIL: gcc.dg/guality/pr58791-4.c -O2 -flto -fno-use-linker-plugin
-flto-partition=none -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i ==
486
FAIL: gcc.dg/guality/pr58791-4.c -O2 -flto -fno-use-linker-plugin
-flto-partition=none -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i2 ==
487
FAIL: gcc.dg/guality/pr58791-4.c -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i ==
486
FAIL: gcc.dg/guality/pr58791-4.c -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects -DPREVENT_OPTIMIZATION line pr58791-4.c:32 i2 ==
487
It is connected to debug-info, and I cannot solve it myself. I am not
sure how this should work when we try to print dead-code variable (i2)
while debugging -O2 (O3/Os) compiled executable. Jakub created that
test, he is in CC already.
I will also look at aarch64 regstrap a bit later. But that job should
also be done for all other targets. Segher, could you test it on
rs6000?
> > > > In pr84524.c we got a loop with an extended inline asm:
> > > > asm volatile ("" : "+r" (v))
It seems a good idea to add “memory” clobber into asm and recheck DDG
in this test on aarch64 with both SMS and sched-deps patches applied.
I'll check it.
Roman