On 8/20/19 4:07 PM, Jeff Law wrote:
On 11/20/18 4:39 PM, Michael Eager wrote:
On 11/20/2018 03:10 PM, Jeff Law wrote:
On 11/20/18 11:07 AM, Michael Eager wrote:
On 11/18/2018 08:14 AM, Jeff Law wrote:
On 11/18/18 8:44 AM, Michael Eager wrote:
On 11/16/18 14:50, Segher Boessenkool wrote:
Hi!

On Wed, Nov 14, 2018 at 11:22:58AM -0800, Michael Eager wrote:
The (mem:SI) is converted to (mem:QI).

The return from make_extract() is
       (zero_extract:SI (mem:QI (reg/v/f:SI 50 [ gp ]))
          (const_int 1 [0x1])
          (const_int 2 [0x2]))

The target has an instruction pattern for zero_extract, but it
matches
SI values, not QI.  So the instruction which implements a test of a
bit
flag is never generated.

The RTL documentation says:

      If @var{loc} is in memory, its mode must be a single-byte
integer
mode.

But obviously various ports use other modes, too.  I wonder if that
ever
worked well.

Thanks.  I hadn't noticed this.

Does anyone have any idea why there is a restriction on the mode?
Other instruction patterns don't place arbitrary restriction on the
memory access mode.
Not offhand.   BUt it's been around for a long time (at least since the
early 90s).  I stumbled over it several times through the years.  It
could well be an artifact of the m68k or the vax.

The internal RTL should not be dictating what the target arch can or
cannot implement.  Reload should insert any needed conversions,
especially ones which narrow the size.

I have a patch which removes this conversion.  Works for my target.  I
built and ran 'make check' for x86 with no additional failures.  I don't
have a VAX or M68K sitting around to test.  :-)
I can do correctness tests for m68k via qemu.  Essentially verifying it
still bootstraps:-)  Just pass along a git formatted patch and I can
throw it into the tester.

Here's the patch.  I'll edit the doc if this works for 68K.
So I'm finally coming back to this.  The test certainly bootstraps on
the m68k and various other targets.  It doesn't seem to cause any
regressions on the various targets that get tested.

I wandered the backends and uses of zero_extract with MEMs not in QImode
are actually relatively rare and not likely things that are being
exercised thoroughly.  Ironically the vax seems to have the most of
these kinds of patterns.

After re-reviewing Paulo's changes from 2006 as well as an old thread
from the gcc-2 era, I'm even more concerned than I was before WRT this
change potentially allowing a read beyond the boundary of the original
object, particularly on targets that don't require strict alignment.
That could be fatal if the object happens to be at the end of a page.

The 1992 thread I suspect is what ultimately led to the note in the
manual which limits memory operands of a zero_extract to QImode.  We
didn't have version control in those days, so I can't be 100% certain.

This kind of situation is rare, but not unheard of (I've tripped over
this kind of stuff myself).

I think we're ultimately better off fixing the target to better match
what combine is doing.

Thanks for investigating this. It looks like you did quite a bit of archaeological research.

It seems to me that the only way that a read outside of the original object bounds is if the size is expanded. There's code in combine.c to insure that does not happen. I guess that an extv could be generated with incorrect operands which would be outside the object bounds, but that would be a bug.

I don't doubt that the QI restriction on zero_extract resolves some long-ago bug report, but I suspect that it hides the bug rather than fixes it. It's also possible that the actual bug has been fixed, but the restriction has never been removed. Without a test case, it isn't easy to do more than offer a conjecture.

Combine is complex, but I don't think that target descriptions should conform to its behaviors; combine should adapt to the target. (Incidentally, the target has a peephole optimization which is supposed to do what combine should be doing in this case. It worked with GCC-4; I don't recall why I never investigated why it isn't working with GCC-8. Possibly, the error occurs before peephole optimization.)

As a practical matter, the patch seems to work without problem. (I've attached it here again; I'd sent it previously in an off-list email.) The target is not in the GCC tree, but in a separate public repo, and I don't believe that it will ever be upstreamed, so there's no issue with a future merge conflict. My project to upgrade the target to GCC-8 was completed last year and I'm no longer maintaining the tool chain.

Thanks again for following up on this.


--
Michael Eager    ea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306
>From 7de0670b38c7b1cc3b805bdebe638357f384a9f8 Mon Sep 17 00:00:00 2001
From: Michael Eager <ea...@eagercon.com>
Date: Tue, 20 Nov 2018 15:37:10 -0800
Subject: [PATCH] Eliminate narrowing of zero_extract MEM reference

Combine (make_extraction) converts an 'and' operation to extract a
single bit value into a 'zero_extract' operation.  It also converts
a MEM:SI into MEM:QI.  Zero_extract is documented as only allowing
single byte memory references, perhaps a vestige of VAX or M68K.

This patch removes this narrowing of the memory reference.

* gcc/combine.c(make_extraction): Remove narrowing code.
---
 gcc/combine.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index 93bd3da26d7..fdc79ab7d3e 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7777,17 +7777,7 @@ make_extraction (machine_mode mode, rtx inner, 
HOST_WIDE_INT pos,
       && partial_subreg_p (extraction_mode, mode))
     extraction_mode = mode;
 
-  if (!MEM_P (inner))
-    wanted_inner_mode = wanted_inner_reg_mode;
-  else
-    {
-      /* Be careful not to go beyond the extracted object and maintain the
-        natural alignment of the memory.  */
-      wanted_inner_mode = smallest_int_mode_for_size (len);
-      while (pos % GET_MODE_BITSIZE (wanted_inner_mode) + len
-            > GET_MODE_BITSIZE (wanted_inner_mode))
-       wanted_inner_mode = GET_MODE_WIDER_MODE (wanted_inner_mode).require ();
-    }
+  wanted_inner_mode = wanted_inner_reg_mode;
 
   orig_pos = pos;
 
-- 
2.17.2

Reply via email to