On Thu, 17 Oct 2013, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Thu, 17 Oct 2013, Richard Sandiford wrote: > > > >> Kenneth Zadeck <zad...@naturalbridge.com> writes: > >> >> As mentioned in my message yesterday, I thought your new way of > >> >> canonising > >> >> unsigned tree constants meant that there was always an upper zero bit. > >> >> Is that right? > >> > i believe this is correct. > >> >> If so, xprecision < precision is a no-op, because the number always > >> >> has the right form for wider precisions. The only difficult case is > >> >> xprecision == precision, since then we need to peel off any upper -1 > >> >> HWIs. > >> > say my HWI size is 8 bits (just to keep from typing a million 'f's. if > >> > i have a 16 bit unsigned number that is all 1s in the tree world it is 3 > >> > hwis > >> > 0x00 0xff 0xff. > >> > > >> > but inside regular wide int, it would take 1 wide int whose value is > >> > 0xff. > >> > inside of max it would be the same as the tree, but then the test > >> > precision < xprecision + hbpwi never kicks in because precision is > >> > guaranteed to be huge. > >> > > >> > inside of addr_wide_int, i think we tank with the assertion. > >> > >> It should be OK for addr_wide_int too. The precision still fits 2 HWIs. > >> The initial length is greater than the maximum length of an addr_wide_int, > >> but your "len = MAX (len, max_len)" deals with that. > > > > It's > > > > len = MIN (len, max_len) > > Oops, yeah, I meant MIN, sorry. > > > which looked suspicious to me, but with precision >= xprecision > > precision can only be zero if xprecision is zero which looked to > > me like it cannot happen - or rather it should be fixed. > > Despite the comment above the code, I don't think this MIN is there > for the zero-precision case. I think it's there to handle the new > tree representation. > > The new tree representation can have a length greater than max_len > for an unsigned tree constant that occupies a whole number of HWIs. > The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. > When extended to max_wide_int the representation is the same. > But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. > The MIN trims the length from 3 to 2 in the last case.
Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). > > > the case actually comes up on the ppc because they do a lot of 128 bit > >> > math. I think i got thru the x86-64 without noticing this. > >> > >> Well, it'd be suspicious if we're directly using 128-bit numbers > >> in addr_wide_int. The justification for the assertion was that we > >> should explicitly truncate to addr_wide_int when deliberately > >> ignoring upper bits, beyond bit or byte address width. 128 bits > >> definitely falls into that category on powerpc. > > > > My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid > > wide-int value if it has precision 16. > > No, for a 16-bit wide_int it should be 0xff. 0x00 0xff 0xff is correct > for any wide_int wider than 16 bits though. > > > AFAIK that is what the code produces, > > In which case? This is: > > precision == 16 > xprecision == 16 > len == 3 > max_len == 2 > > The MIN trims the len to 2 and then the loop Kenny added trims it > again to 1, so the "0x00 0xff 0xff" becomes "0xff". The "0x00 0xff" > is still there in the array, but not used. > > > but now Kenny says this is only for some kind > > of wide-ints but not all? That is, why is > > > > inline wi::storage_ref > > wi::int_traits <const_tree>::decompose (HOST_WIDE_INT *scratch, > > unsigned int precision, const_tree > > x) > > { > > unsigned int len = TREE_INT_CST_NUNITS (x); > > const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) &TREE_INT_CST_ELT (x, > > 0); > > return wi::storage_ref (val, len, precision); > > } > > > > not a valid implementation together with making sure that the > > INTEGER_CST tree rep has that extra word of zeros if required? > > The fundamental problem here is that we're trying to support two cases: > > (a) doing N-bit arithemtic in cases where the inputs have N bits > (b) doing N-bit arithmetic in cases where the inputs have fewer than N bits > and are extended according to TYPE_SIGN. > > Let's assume 32-bit HWIs. The 16-bit (4-hex-digit) constant 0x8000 is > 0x8000 regardless of whether the type is signed or unsigned. But if it's > extended to 32-bits you get two different numbers 0xffff8000 and 0x00008000, > depending on the sign. > > So for one value of the "precision" parameter (i.e. xprecision), signed > and unsigned constants produce the same number. But for another value > of the "precision" parameter (those greater than xprecision), signed and > unsigned constants produce different numbers. Yet at the moment the tree > constant has a single representation. But a correctly extended one, up to its len! (as opposed to RTL) > So I think the possibilities are: > > (1) Use the representation of an N-bit wide_int to store N-bit tree constants. > Do work when extending them to wider wide_ints. > > (2) Use the representation of max_wide_int to store N-bit tree constants. > Do work when creating an N-bit wide_int. > > (3) Store both representations in the tree constant. > > (4) Require all tree arithmetic to be done in the same way as rtl arithmetic, > with explicit extensions. This gets rid of case (b). > > (5) Require all tree arithemtic to be done in wider wide_ints than the inputs, > which I think is what you preferred. This gets rid of case (a). > > (6) Allow the same wide_int constant to have several different > representations. > Remember that this is to some extent what Kenny's original implementation > did, since partial HWIs were filled with don't-cares. You and I are the > ones who argued that each wide_int should have a single representation. As far as I can see when we need to extend the tree rep to a bigger precision (extend according to its sign) the only thing we have to do to make the wide-int rep canonical is sign-extend the tree rep at the desired precision. And as we never truncate this sign-extension is a no-op for signed tree constants, is a no-op for unsigned tree constants when precision > xprecision but not for unsigned tree constants and precision == xprecision when the MSB is set. I don't see where adding the fake zero padding for unsigneds in the tree rep helps here. Thus, Index: gcc/tree.h =================================================================== --- gcc/tree.h (revision 203743) +++ gcc/tree.h (working copy) @@ -5189,28 +5189,28 @@ wi::int_traits <const_tree>::decompose ( /* Got to be careful of precision 0 values. */ if (precision) len = MIN (len, max_len); - if (TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED) + if (TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED + && precision == xprecision + && len == max_len) { unsigned int small_prec = precision & (HOST_BITS_PER_WIDE_INT - 1); - if (small_prec) + if (small_prec + && len == max_len) { /* We have to futz with this because the canonization for short unsigned numbers in wide-int is different from the canonized short unsigned numbers in the tree-cst. */ - if (len == max_len) - { - for (unsigned int i = 0; i < len - 1; i++) - scratch[i] = val[i]; - scratch[len - 1] = sext_hwi (val[len - 1], precision); - return wi::storage_ref (scratch, len, precision); - } + for (unsigned int i = 0; i < len - 1; i++) + scratch[i] = val[i]; + scratch[len - 1] = sext_hwi (val[len - 1], small_prec); + while (len > 1 && scratch[len - 1] == (HOST_WIDE_INT)-1) + len--; + return wi::storage_ref (scratch, len, precision); } /* We have to futz here because a large unsigned int with precision 128 may look (0x0 0xFFFFFFFFFFFFFFFF 0xF...) as a tree-cst and as (0xF...) as a wide-int. */ - else if (precision == xprecision && len == max_len) - while (len > 1 && val[len - 1] == (HOST_WIDE_INT)-1) - len--; + while (len > 1 && val[len - 1] == (HOST_WIDE_INT)-1) + len--; } /* Signed and the rest of the unsigned cases are easy. */ explain: the current code misses compressing the now sign-extended value in the small_prec case (and it still sext_hwi's to the wrong precision if precision > HOST_BITS_PER_WIDE_INT). It restricts special-casing to the only relevant case, no explicit or implicit extension. ? Richard. > > [This function shows another optimization issue: > > > > case BOOLEAN_TYPE: > > /* Cache false or true. */ > > limit = 2; > > if (wi::leu_p (cst, 1)) > > ix = cst.to_uhwi (); > > > > I would have expected cst <= 1 be optimized to cst.len == 1 && > > cst.val[0] <= 1. It expands to > > > > <L27>: > > MEM[(long int *)&D.50698 + 16B] = 1; > > MEM[(struct wide_int_ref_storage *)&D.50698] = &MEM[(struct > > wide_int_ref_storage *)&D.50698].scratch; > > MEM[(struct wide_int_ref_storage *)&D.50698 + 8B] = 1; > > MEM[(struct wide_int_ref_storage *)&D.50698 + 12B] = 32; > > _277 = MEM[(const struct wide_int_storage *)&cst + 260B]; > > if (_277 <= 64) > > goto <bb 42>; > > else > > goto <bb 43>; > > > > <bb 42>: > > xl_491 = zext_hwi (1, 32); // ok, checking enabled and thus out-of-line > > _494 = MEM[(const long int *)&cst]; > > _495 = (long unsigned int) _494; > > yl_496 = zext_hwi (_495, _277); > > _497 = xl_491 < yl_496; > > goto <bb 44>; > > > > <bb 43>: > > _503 = wi::ltu_p_large (&MEM[(struct wide_int_ref_storage > > *)&D.50698].scratch, 1, 32, &MEM[(const struct wide_int_storage > > *)&cst].val, len_274, _277); > > > > this keeps D.50698 and cst un-SRAable - inline storage is problematic > > for this reason. But the representation should guarantee the > > compare with a low precision (32 bit) constant is evaluatable > > at compile-time if len of the larger value is > 1, no? > > > > <bb 44>: > > # _504 = PHI <_497(42), _503(43)> > > D.50698 ={v} {CLOBBER}; > > if (_504 != 0) > > goto <bb 45>; > > else > > goto <bb 46>; > > > > <bb 45>: > > pretmp_563 = MEM[(const struct wide_int_storage *)&cst + 256B]; > > goto <bb 229> (<L131>); > > > > <bb 46>: > > _65 = generic_wide_int<wide_int_storage>::to_uhwi (&cst, 0); > > ix_66 = (int) _65; > > goto <bb 91>; > > > > The question is whether we should try to optimize wide-int for > > such cases or simply not use wi:leu_p (cst, 1) but rather > > > > if (cst.fits_uhwi_p () == 1 && cst.to_uhwi () < 1) > > > > ? > > I think we should do the first, trying to optimise wide_int. > > Thanks, > Richard > > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend