Re: [wide-int] int_traits

Richard Biener Thu, 17 Oct 2013 04:30:40 -0700

On Thu, 17 Oct 2013, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > On Thu, 17 Oct 2013, Richard Sandiford wrote:
> >
> >> Kenneth Zadeck <zad...@naturalbridge.com> writes:
> >> >> As mentioned in my message yesterday, I thought your new way of 
> >> >> canonising
> >> >> unsigned tree constants meant that there was always an upper zero bit.
> >> >> Is that right?
> >> > i believe this is correct.
> >> >> If so, xprecision < precision is a no-op, because the number always
> >> >> has the right form for wider precisions.  The only difficult case is
> >> >> xprecision == precision, since then we need to peel off any upper -1 
> >> >> HWIs.
> >> > say my HWI size is 8 bits (just to keep from typing a million 'f's.   if 
> >> > i have a 16 bit unsigned number that is all 1s in the tree world it is 3 
> >> > hwis
> >> > 0x00 0xff 0xff.
> >> >
> >> > but inside regular wide int, it would take 1 wide int whose value is 
> >> > 0xff.
> >> > inside of max it would be the same as the tree, but then the test 
> >> > precision < xprecision + hbpwi never kicks in because precision is 
> >> > guaranteed to be huge.
> >> >
> >> > inside of addr_wide_int, i think we tank with the assertion.
> >> 
> >> It should be OK for addr_wide_int too.  The precision still fits 2 HWIs.
> >> The initial length is greater than the maximum length of an addr_wide_int,
> >> but your "len = MAX (len, max_len)" deals with that.
> >
> > It's
> >
> >   len = MIN (len, max_len)
> 
> Oops, yeah, I meant MIN, sorry.
>
> > which looked suspicious to me, but with precision >= xprecision
> > precision can only be zero if xprecision is zero which looked to
> > me like it cannot happen - or rather it should be fixed.
> 
> Despite the comment above the code, I don't think this MIN is there
> for the zero-precision case.  I think it's there to handle the new
> tree representation.
> 
> The new tree representation can have a length greater than max_len
> for an unsigned tree constant that occupies a whole number of HWIs.
> The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
> When extended to max_wide_int the representation is the same.
> But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero.
> The MIN trims the length from 3 to 2 in the last case.


Oh, so it was the tree rep that changed?  _Why_ was it changed?
We still cannot use it directly from wide-int and the extra
word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()).
 
> > > the case actually comes up on the ppc because they do a lot of 128 bit 
> >> > math.    I think i got thru the x86-64 without noticing this.
> >> 
> >> Well, it'd be suspicious if we're directly using 128-bit numbers
> >> in addr_wide_int.  The justification for the assertion was that we
> >> should explicitly truncate to addr_wide_int when deliberately
> >> ignoring upper bits, beyond bit or byte address width.  128 bits
> >> definitely falls into that category on powerpc.
> >
> > My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid
> > wide-int value if it has precision 16.
> 
> No, for a 16-bit wide_int it should be 0xff.  0x00 0xff 0xff is correct
> for any wide_int wider than 16 bits though.
> 
> > AFAIK that is what the code produces,
> 
> In which case?  This is:
> 
>    precision == 16
>    xprecision == 16
>    len == 3
>    max_len == 2
> 
> The MIN trims the len to 2 and then the loop Kenny added trims it
> again to 1, so the "0x00 0xff 0xff" becomes "0xff".  The "0x00 0xff"
> is still there in the array, but not used.
> 
> > but now Kenny says this is only for some kind
> > of wide-ints but not all?  That is, why is
> >
> > inline wi::storage_ref
> > wi::int_traits <const_tree>::decompose (HOST_WIDE_INT *scratch,
> >                                         unsigned int precision, const_tree 
> > x)
> > {
> >   unsigned int len = TREE_INT_CST_NUNITS (x);
> >   const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) &TREE_INT_CST_ELT (x, 
> > 0);
> >   return wi::storage_ref (val, len, precision);
> > }
> >
> > not a valid implementation together with making sure that the
> > INTEGER_CST tree rep has that extra word of zeros if required?
> 
> The fundamental problem here is that we're trying to support two cases:
> 
> (a) doing N-bit arithemtic in cases where the inputs have N bits
> (b) doing N-bit arithmetic in cases where the inputs have fewer than N bits
>     and are extended according to TYPE_SIGN.
> 
> Let's assume 32-bit HWIs.  The 16-bit (4-hex-digit) constant 0x8000 is
> 0x8000 regardless of whether the type is signed or unsigned.  But if it's
> extended to 32-bits you get two different numbers 0xffff8000 and 0x00008000,
> depending on the sign.
> 
> So for one value of the "precision" parameter (i.e. xprecision), signed
> and unsigned constants produce the same number.  But for another value
> of the "precision" parameter (those greater than xprecision), signed and
> unsigned constants produce different numbers.  Yet at the moment the tree
> constant has a single representation.

But a correctly extended one, up to its len!  (as opposed to RTL)

> So I think the possibilities are:
> 
> (1) Use the representation of an N-bit wide_int to store N-bit tree constants.
>     Do work when extending them to wider wide_ints.
> 
> (2) Use the representation of max_wide_int to store N-bit tree constants.
>     Do work when creating an N-bit wide_int.
> 
> (3) Store both representations in the tree constant.
> 
> (4) Require all tree arithmetic to be done in the same way as rtl arithmetic,
>     with explicit extensions.  This gets rid of case (b).
> 
> (5) Require all tree arithemtic to be done in wider wide_ints than the inputs,
>     which I think is what you preferred.  This gets rid of case (a).
> 
> (6) Allow the same wide_int constant to have several different 
> representations.
>     Remember that this is to some extent what Kenny's original implementation
>     did, since partial HWIs were filled with don't-cares.  You and I are the
>     ones who argued that each wide_int should have a single representation.

As far as I can see when we need to extend the tree rep to a bigger
precision (extend according to its sign) the only thing we have to
do to make the wide-int rep canonical is sign-extend the tree rep
at the desired precision.  And as we never truncate this sign-extension
is a no-op for signed tree constants, is a no-op for unsigned
tree constants when precision > xprecision but not for unsigned tree
constants and precision == xprecision when the MSB is set.

I don't see where adding the fake zero padding for unsigneds in the
tree rep helps here.

Thus,

Index: gcc/tree.h
===================================================================
--- gcc/tree.h  (revision 203743)
+++ gcc/tree.h  (working copy)
@@ -5189,28 +5189,28 @@ wi::int_traits <const_tree>::decompose (
   /* Got to be careful of precision 0 values.  */
   if (precision)
     len = MIN (len, max_len);
-  if (TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED)
+  if (TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED
+      && precision == xprecision
+      && len == max_len)
     {
       unsigned int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1);
-      if (small_prec)
+      if (small_prec
+         && len == max_len)
        {
          /* We have to futz with this because the canonization for
             short unsigned numbers in wide-int is different from the
             canonized short unsigned numbers in the tree-cst.  */
-         if (len == max_len) 
-           {
-             for (unsigned int i = 0; i < len - 1; i++)
-               scratch[i] = val[i];
-             scratch[len - 1] = sext_hwi (val[len - 1], precision);
-             return wi::storage_ref (scratch, len, precision);
-           }
+         for (unsigned int i = 0; i < len - 1; i++)
+           scratch[i] = val[i];
+         scratch[len - 1] = sext_hwi (val[len - 1], small_prec);
+         while (len > 1 && scratch[len - 1] == (HOST_WIDE_INT)-1)
+           len--;
+         return wi::storage_ref (scratch, len, precision);
        } 
       /* We have to futz here because a large unsigned int with
         precision 128 may look (0x0 0xFFFFFFFFFFFFFFFF 0xF...) as a
         tree-cst and as (0xF...) as a wide-int.  */
-      else if (precision == xprecision && len == max_len)
-        while (len > 1 && val[len - 1] == (HOST_WIDE_INT)-1)
-          len--;
+      while (len > 1 && val[len - 1] == (HOST_WIDE_INT)-1)
+       len--;
     }
 
   /* Signed and the rest of the unsigned cases are easy.  */

explain: the current code misses compressing the now sign-extended
value in the small_prec case (and it still sext_hwi's to the wrong
precision if precision > HOST_BITS_PER_WIDE_INT).  It restricts
special-casing to the only relevant case, no explicit or
implicit extension.

?

Richard.


> > [This function shows another optimization issue:
> >
> >     case BOOLEAN_TYPE:
> >       /* Cache false or true.  */
> >       limit = 2;
> >       if (wi::leu_p (cst, 1))
> >         ix = cst.to_uhwi ();
> >
> > I would have expected cst <= 1 be optimized to cst.len == 1 &&
> > cst.val[0] <= 1.  It expands to
> >
> > <L27>:
> >   MEM[(long int *)&D.50698 + 16B] = 1;
> >   MEM[(struct wide_int_ref_storage *)&D.50698] = &MEM[(struct 
> > wide_int_ref_storage *)&D.50698].scratch;
> >   MEM[(struct wide_int_ref_storage *)&D.50698 + 8B] = 1;
> >   MEM[(struct wide_int_ref_storage *)&D.50698 + 12B] = 32;
> >   _277 = MEM[(const struct wide_int_storage *)&cst + 260B];
> >   if (_277 <= 64)
> >     goto <bb 42>;
> >   else
> >     goto <bb 43>;
> >
> >   <bb 42>:
> >   xl_491 = zext_hwi (1, 32);  // ok, checking enabled and thus out-of-line
> >   _494 = MEM[(const long int *)&cst];
> >   _495 = (long unsigned int) _494;
> >   yl_496 = zext_hwi (_495, _277);
> >   _497 = xl_491 < yl_496;
> >   goto <bb 44>;
> >
> >   <bb 43>:
> >   _503 = wi::ltu_p_large (&MEM[(struct wide_int_ref_storage 
> > *)&D.50698].scratch, 1, 32, &MEM[(const struct wide_int_storage 
> > *)&cst].val, len_274, _277);
> >
> > this keeps D.50698 and cst un-SRAable - inline storage is problematic
> > for this reason.  But the representation should guarantee the
> > compare with a low precision (32 bit) constant is evaluatable
> > at compile-time if len of the larger value is > 1, no?
> >
> >   <bb 44>:
> >   # _504 = PHI <_497(42), _503(43)>
> >   D.50698 ={v} {CLOBBER};
> >   if (_504 != 0)
> >     goto <bb 45>;
> >   else
> >     goto <bb 46>;
> >
> >   <bb 45>:
> >   pretmp_563 = MEM[(const struct wide_int_storage *)&cst + 256B];
> >   goto <bb 229> (<L131>);
> >
> >   <bb 46>:
> >   _65 = generic_wide_int<wide_int_storage>::to_uhwi (&cst, 0);
> >   ix_66 = (int) _65;
> >   goto <bb 91>;
> >
> > The question is whether we should try to optimize wide-int for
> > such cases or simply not use wi:leu_p (cst, 1) but rather
> >
> >  if (cst.fits_uhwi_p () == 1 && cst.to_uhwi () < 1)
> >
> > ?
> 
> I think we should do the first, trying to optimise wide_int.
> 
> Thanks,
> Richard
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

Re: [wide-int] int_traits

Reply via email to