Kenneth Zadeck <[email protected]> writes:
> On 12/14/2013 06:40 AM, Richard Sandiford wrote:
>> Hi Kenny,
>>
>> Sorry for the slow response.
>>
>> Kenneth Zadeck <[email protected]> writes:
>>> Index: gcc/wide-int.cc
>>> ===================================================================
>>> --- gcc/wide-int.cc (revision 205765)
>>> +++ gcc/wide-int.cc (working copy)
>>> @@ -1275,23 +1275,31 @@ wi::mul_internal (HOST_WIDE_INT *val, co
>>> unsigned int j;
>>> unsigned int blocks_needed = BLOCKS_NEEDED (prec);
>>> unsigned int half_blocks_needed = blocks_needed * 2;
>>> - /* The sizes here are scaled to support a 2x largest mode by 2x
>>> - largest mode yielding a 4x largest mode result. This is what is
>>> - needed by vpn. */
>>>
>>> + /* The sizes here are scaled to support a 2*largest mode + a little
>>> + by 2*largest mode + a little yielding a 4*largest mode + a
>>> + little. This is what is needed by vpn. */
>>> unsigned HOST_HALF_WIDE_INT
>>> - u[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
>>> + u[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
>>> + unsigned int ulen;
>>> unsigned HOST_HALF_WIDE_INT
>>> - v[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
>>> - /* The '2' in 'R' is because we are internally doing a full
>>> + v[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
>>> + unsigned int vlen;
>>> + unsigned int uvlen;
>>> + unsigned int vallen;
>>> +
>>> + /* The '4' in 'R' is because we are internally doing a wide
>>> multiply. */
>>> unsigned HOST_HALF_WIDE_INT
>>> - r[2 * 4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
>>> - HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT) -
>>> 1;
>>> + r[4 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
>> While we're here I think we should convert these arrays into allocas,
>> so that we're not hard-coding the specialness of vpn in wide-int.cc.
>> I.e. rather than having arrays of maximum size, use XALLOCAVEC to
>> allocate a specific number of stack HWIs, once we know how many
>> we actually need. That includes the new arrays added in the patch.
> I had thought of doing this but there is a part of me that has always
> thought of this as being more expensive, but in the scheme of things it
> is just noise here, so i will do it.
Thanks.
>>> + /* True if the result needs to be negated. */
>>> + bool is_neg = false;
>>>
>>> /* If the top level routine did not really pass in an overflow, then
>>> just make sure that we never attempt to set it. */
>>> bool needs_overflow = (overflow != 0);
>>> + const HOST_WIDE_INT zero = 0;
>>> if (needs_overflow)
>>> *overflow = false;
>>>
>>> @@ -1382,61 +1392,96 @@ wi::mul_internal (HOST_WIDE_INT *val, co
>>> return 1;
>>> }
>>>
>>> - /* We do unsigned mul and then correct it. */
>>> - wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len,
>>> - half_blocks_needed, prec, SIGNED);
>>> - wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len,
>>> - half_blocks_needed, prec, SIGNED);
>>> -
>>> - /* The 2 is for a full mult. */
>>> - memset (r, 0, half_blocks_needed * 2
>>> - * HOST_BITS_PER_HALF_WIDE_INT / CHAR_BIT);
>>> + ulen = op1len * 2;
>>> + vlen = op2len * 2;
>>> +
>>> + if (sgn == SIGNED)
>>> + {
>>> + if (wi::neg_p (op1))
>>> + {
>>> + HOST_WIDE_INT nop1[2 * WIDE_INT_MAX_ELTS];
>>> +
>> E.g. following on from the above:
>>
>> /* Add an extra HWI so that we can represent the negative of
>> the minimum. */
>> HOST_WIDE_INT *nop1 = XALLOCAVEC (HOST_WIDE_INT, op1len + 1);
>>
>>> + int nop1len
>>> + = wi::sub_large (nop1, &zero, 1, op1val, op1len, prec + 1, SIGNED,
>>> 0);
>> Not sure about the "prec + 1" here, since it could cause nop1len to be
>> greater than the maximum length for "prec". And you go on to unpack
>> nop1 in "prec" rather than "prec + 1".
> the unpacking with prec is the wrong part. That should have been prec + 1
Hmm, I still think it's the opposite, because...
>> AIUI, once you've taken the absolutes, you've got two unsigned numbers
>> of precision "prec" and create a "prec * 2" result, so naybe:
>> sub_large (..., prec, UNSIGNED, 0) instead?
>
> the signed and unsigned does not matter here. the prec + 1 means we
> never overflow.
...my point is that after this we treat the number as unsigned,
so it isn't a question of whether the absolute value overflows
the precision as a signed number but whether it overflows as an
unsigned number. And it never will. It might need 1 extra HWI
than the original input if op1len < blocks_needed, but it shouldn't
need extra precision.
E.g. say for 8-bit HWIs we start out with a signed -0x80 * -0x80.
We convert that into an unsigned 0x80 * 0x80, which is still
precision 8 * precision 8 -> precision 16. And we'd need to be
able to handle those precision-8 values correctly (with no extra
zero HWI) if we had an unsigned 0x80 * 0x80 from the outset.
>>> + is_neg = !is_neg;
>>> + ulen = nop1len * 2;
>>> + wi_unpack (u, (const unsigned HOST_WIDE_INT*)nop1, nop1len,
>>> + ulen, prec, SIGNED);
>> Back on the alloca theme, we'd have:
>>
>> u = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, ulen);
>>
>> before this and other unpacks.
>>
>> Note that there's supposed to be space after ")" in casts. Lots of
>> instances in the patch :-)
>>
>>> + /* UNSIGNED mul. */
>>> + /* For large positive integers inside regular wide-ints, we must
>>> + explictly expand out the 1s to full width for the mul to be
>>> + correct. Note that the top bit will never be set for a
>>> + unsigned number in offset_int of widest-int. */
>> s/of/or/. But I think the comment is confusing because the unsignedness
>> is a property of the multiplication rather than the inputs. We should never
>> be doing an unsigned operation on widest_int to begin with. And if
>> offset_ints are being treated as having a sign then the same is true there.
>>
>> If we ever do have an unsigned multiplication on those types for some reason
>> then -1 will (rightly) be treated as the maximum unsigned value.
> The real question here is "do you want to do an assert or do you want to
> explain what happens if the input comes in that way?"
Sorry, to be clear, I was only talking about the "Note..." part of the
comment rather than the full thing. I think the first part of the comment
and the code itself is fine.
> The current world
> is actually structured so that we never ask about overflow for the two
> larger classes because the reason that you used those classes was that
> you never wanted to have this discussion. So if you never ask about
> overflow, then it really does not matter because we are not going to
> return enough bits for you to care what happened on the inside. Of
> course that could change and someone could say that they wanted overflow
> on widest-int. Then the comment makes sense, with revisions, unless
> your review of the code that wants overflow on widest int suggests that
> they are just being stupid.
But widest_int is now supposed to be at least 1 bit wider than widest
input type (unlike previously where it was double the widest input type).
So I can definitely see cases where we'd want to know whether a
widest_int * widest_int result overflows.
My point is that the widest_int * widest_int would normally be a signed
multiplication rather than an unsigned multiplication, since the extra
1 bit of precision allows every operation to be signed. So it isn't
a case of whether the top bit of a widest_int will be set, but whether
we ever reach here for widest_int in the first place. We should be
going down the sgn == SIGNED path rather than the sgn == UNSIGNED path.
widest_int can represent an all-1s value, usually interpreted as -1.
If we do go down this sgn == UNSIGNED path for widest_int then we will
instead treat the all-1s value as the maximum unsigned number, just like
for any other kind of wide int.
As far as this function goes there really is no difference between
wide_int, offset_int and widest_int. Which is good, because offset_int
and widest_int should just be wide_ints that are optimised for a specific
and fixed precision.
Thanks,
Richard