Richard Sandiford wrote:
> Georg-Johann Lay <a...@gjlay.de> writes:
>> This patch fixes PR51374 by more strictly updating mem_last_set.
>> Sloppy handling of mem_last_set can lead to error in volatile correctness
>> because combine then is allowed to drag one volatile access over an other
>> volatile thing (volatile access, asm volatile, unspec_volatile...).
>>
>> As explained in the source comment, any volatile might change memory, even a
>> volatile read.
> 
> This looks too broad to me.  Volatile MEMs don't usually alias
> non-volatile ones (so for example, reads from volatile memory shouldn't
> affect reads from non-volatile memory).

The patch tried to fix the issue. If there is less invasive solutions, that's
always preferred, of course.

> What do you think about instead changing:
> 
>   /* If there are any volatile insns between INSN and I3, reject, because
>      they might affect machine state.  */
> 
>   for (p = NEXT_INSN (insn); p != i3; p = NEXT_INSN (p))
>     if (INSN_P (p) && p != succ && p != succ2 && volatile_insn_p (PATTERN 
> (p)))
>       return 0;
> 
> to:
> 
>   /* If INSN contains volatile references (specifically volatile MEMs),
>      we cannot combine across any other volatile references.
>      Even if INSN doesn't contain volatile references, any intervening
>      volatile insn might affect machine state.  */
> 
>   pred = volatile_refs_p (insn) ? volatile_refs_p : volatile_insn_p;
>   for (p = NEXT_INSN (insn); p != i3; p = NEXT_INSN (p))
>     if (NONDEBUG_INSN_P (p) && p != succ && p != succ2 && pred (PATTERN (p)))
>       return 0;

This would still allow a volatile_refs_p to cross a volatile_insn_p?

And some lines above there is this:

  /* If INSN contains anything volatile, or is an `asm' (whether volatile
     or not), reject, unless nothing volatile comes between it and I3 */

  if (GET_CODE (src) == ASM_OPERANDS || volatile_refs_p (src))
    {
      /* Make sure neither succ nor succ2 contains a volatile reference.  */
      if (succ2 != 0 && volatile_refs_p (PATTERN (succ2)))
        return 0;
      if (succ != 0 && volatile_refs_p (PATTERN (succ)))
        return 0;
      /* We'll check insns between INSN and I3 below.  */
    }

Would our change (whatever it will look like) make that code superfluous?

> As you say in the PR notes, we've already handled volatile_insn_p (insn)
> by this stage, so the new loop is handling:
> 
>    volatile_refs_p (insn) && !volatile_insn_p (insn)

There is this part of an if-clause in can_combine_p:

      /* Make sure that the value that is to be substituted for the register
         does not use any registers whose values alter in between.  However,
         If the insns are adjacent, a use can't cross a set even though we
         think it might (this can happen for a sequence of insns each setting
         the same destination; last_set of that register might point to
         a NOTE).  If INSN has a REG_EQUIV note, the register is always
         equivalent to the memory so the substitution is valid even if there
         are intervening stores.  Also, don't move a volatile asm or
         UNSPEC_VOLATILE across any other insns.  */
      || (! all_adjacent
          && (((!MEM_P (src)
                || ! find_reg_note (insn, REG_EQUIV, src))
               && use_crosses_set_p (src, DF_INSN_LUID (insn)))
              || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src))
              || GET_CODE (src) == UNSPEC_VOLATILE))

so it's not a verbatim volatile_insn_p. Maybe it's for historical reason or for
some subtle differences not explained; only someone familiar with combine will
know... perhaps performance issue.

>> Moreover, writes of the shape (set (zero_extract (mem ... update 
>> mem_last_set.
> 
> Good catch.  I think we should use note_stores for added safety though,
> rather than call zero_extract out as a special case.  (I realise the
> function already handles subregs, but still.)

record_dead_and_set_regs_1 is called through note_stores so that (set
(zero_extract (mem)) is handled already and thus no change to
record_dead_and_set_regs_1 is needed, right?

Attached you find a new, tentative patch. It resolves the issue in my small
test program.
However, I think someone with more insight into combine should take over the 
patch.

Johann

        PR rtl-optimization/51374
        * combine.c (can_combine_p): Don't allow volatile_refs_p insns
        to cross other volatile_refs_p insns.

Index: combine.c
===================================================================
--- combine.c	(revision 183695)
+++ combine.c	(working copy)
@@ -1947,12 +1947,21 @@ can_combine_p (rtx insn, rtx i3, rtx pre
       && REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER)
     return 0;
 
-  /* If there are any volatile insns between INSN and I3, reject, because
-     they might affect machine state.  */
+  /* If INSN contains volatile references (specifically volatile MEMs),
+     we cannot combine across any other volatile references.
+     Even if INSN doesn't contain volatile references, any intervening
+     volatile insn might affect machine state.  */
+  {
+    int (*predicat) (const_rtx);
 
-  for (p = NEXT_INSN (insn); p != i3; p = NEXT_INSN (p))
-    if (INSN_P (p) && p != succ && p != succ2 && volatile_insn_p (PATTERN (p)))
-      return 0;
+    predicat = volatile_refs_p (PATTERN (insn))
+      ? volatile_refs_p
+      : volatile_insn_p;
+    
+    for (p = NEXT_INSN (insn); p != i3; p = NEXT_INSN (p))
+      if (INSN_P (p) && p != succ && p != succ2 && predicat (PATTERN (p)))
+        return 0;
+  }
 
   /* If INSN contains an autoincrement or autodecrement, make sure that
      register is not used between there and I3, and not already used in

Reply via email to