Re: patch to fix constant math - 4th patch - the wide-int class.

Richard Biener Wed, 31 Oct 2012 05:12:09 -0700

On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
<rdsandif...@googlemail.com> wrote:
> Richard Biener <richard.guent...@gmail.com> writes:
>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>> <rdsandif...@googlemail.com> wrote:
>>> Richard Biener <richard.guent...@gmail.com> writes:
>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>> <zad...@naturalbridge.com> wrote:
>>>>>
>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>
>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump <mikest...@comcast.net> 
>>>>>> wrote:
>>>>>>>
>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener <richard.guent...@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>> <zad...@naturalbridge.com> wrote:
>>>>>>>>>
>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>
>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>
>>>>>>>>>> are we sure this rounds properly?  Consider a port with max byte mode
>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>
>>>>>>>>> I do not believe that this can happen.   The core compiler includes 
>>>>>>>>> all
>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>
>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>
>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes that
>>>>>>> are not power of two, and, if there isn't an int mode that is bigger, 
>>>>>>> we'd
>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>
>>>>>>>>>> I still would like to have the ability to provide specializations of
>>>>>>>>>> wide_int
>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template 
>>>>>>>>>> templated
>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should be
>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>
>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>
>>>>>>>>> If you want to go down this path after the patches get in, go for it.
>>>>>>>>> I
>>>>>>>>> see no use at all for this.
>>>>>>>>> This was not meant to be a plug in replacement for double int. This
>>>>>>>>> goal of
>>>>>>>>> this patch is to get the compiler to do the constant math the way that
>>>>>>>>> the
>>>>>>>>> target does it.   Any such instantiation is by definition placing some
>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>
>>>>>>>> Well, what I don't really like is that we now have two implementations
>>>>>>>> of functions that perform integer math on two-HWI sized integers.  What
>>>>>>>> I also don't like too much is that we have two different interfaces to
>>>>>>>> operate
>>>>>>>> on them!  Can't you see how I come to not liking this?  Especially the
>>>>>>>> latter …
>>>>>>>
>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int is a
>>>>>>> waste of time, as the time is better spent removing double-int from the
>>>>>>> compiler.  All the necessary semantics and code of double-int _has_ been
>>>>>>> refactored into wide-int already.  Changing wide-int in any way to vend
>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>> then all the
>>>>>>> api changes to make double-int share from wide-int is wasted and must 
>>>>>>> then
>>>>>>> be removed.  The path forward is the complete removal of double-int; it 
>>>>>>> is
>>>>>>> wrong, has been wrong and always will be wrong, nothing can change that.
>>>>>>
>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>>> get rid of it - you
>>>>>> will make compile-time math a _lot_ slower.  Just profile when you for
>>>>>> example
>>>>>> change get_inner_reference to use wide_ints.
>>>>>>
>>>>>> To be able to remove double_int in favor of wide_int requires _at least_
>>>>>> templating wide_int on 'len' and providing specializations for 1 and 2.
>>>>>>
>>>>>> It might be a non-issue for math that operates on trees or RTXen due to
>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>> important
>>>>>> paths away from using tree math to using double_ints _for speed reasons_.
>>>>>>
>>>>>> Richard.
>>>>>
>>>>> i do not know why you believe this about the speed.     double int always
>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>
>>>>> the thing about wide int, is that since it does math to the precision's
>>>>> size, it almost never does uses synthetic operations since the sizes for
>>>>> almost every instance can be done using the native math on the machine.
>>>>> almost every call has a check to see if the operation can be done 
>>>>> natively.
>>>>> I seriously doubt that you are going to do TI mode math much faster than i
>>>>> do it and if you do who cares.
>>>>>
>>>>> the number of calls does not effect the performance in any negative way 
>>>>> and
>>>>> it fact is more efficient since common things that require more than one
>>>>> operation in double in are typically done in a single operation.
>>>>
>>>> Simple double-int operations like
>>>>
>>>> inline double_int
>>>> double_int::and_not (double_int b) const
>>>> {
>>>>   double_int result;
>>>>   result.low = low & ~b.low;
>>>>   result.high = high & ~b.high;
>>>>   return result;
>>>> }
>>>>
>>>> are always going to be faster than conditionally executing only one 
>>>> operation
>>>> (but inside an offline function).
>>>
>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>> appropriate here.
>>>
>>> It's interesting that you gave this example, since before you were
>>> complaining about too many fused ops.  Clearly this one could be
>>> removed in favour of separate and() and not() operations, but why
>>> not provide a fused one if there are clients who'll make use of it?
>>
>> I was more concerned about fused operations that use precision
>> or bitsize as input.  That is for example
>>
>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>> +  bool only_sign_bit_p () const;
>>
>> The first is construct a wide-int with precision prec (and sign- or
>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>> should not be necessary and existing callers should be questioned
>> instead of introducing it.
>>
>> In fact wide-int seems to have so many "fused" operations that
>> we run out of sensible recognizable names for them.  Which results
>> in a lot of confusion on what the functions actually do (at least for me).
>
> Well, I suppose I can't really say anything useful either way on
> that one, since I'm not writing the patch and I'm not reviewing it :-)
>
>>> I think Kenny's API is just taking that to its logical conclusion.
>>> There doesn't seem to be anything sacrosanct about the current choice
>>> of what's fused and what isn't.
>>
>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>> operations introduced separately together with the places they are
>> used.  In the current way it's way too tedious to go over all of them
>> and match them with callers, lookup enough context and then
>> make up my mind on whether the caller should do sth different or not.
>>
>> Thus, consider the big initial API a reason that all this review takes
>> so long ...
>>
>>> The speed problem we had using trees for internal arithmetic isn't
>>> IMO a good argument for keeping double_int in preference to wide_int.
>>> Allocating and constructing tree objects to hold temporary values,
>>> storing an integer representation in it, then calling tree arithmetic
>>> routines that pull out the integer representation again and create a
>>> tree to hold the result, is going to be much slower than using either
>>> double_int or wide_int.  I'd be very surprised if we notice any
>>> measurable difference between double_int and wide_int here.
>>>
>>> I still see no reason to keep double_int around.  The width of a host
>>> wide integer really shouldn't have any significance.
>>>
>>> Your main complaint seems to be that the wide_int API is different
>>> from the double_int one, but we can't literally use the same API, since
>>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>>> Having a precision that is separate from the underlying representation
>>> is IMO the most important feature of wide_int, so:
>>>
>>>    template wide_int<2> double_int;
>>>
>>> is never going to be a drop-in, API-compatible replacement for double_int.
>>
>> My reasoning was that if you strip wide-int of precision and bitsize
>> you have a double_int<N> class.
>
> But you don't!  Because...
>
>> Thus wide-int should have a base
>> of that kind and just add precision / bitsize ontop of that.  It wouldn't
>> be a step forward if we end up replacing double_int uses with
>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>> would it?
>
> ...the precision and bitsize isn't an optional extra, either conceptually
> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
> the hood, but the value of N is an internal implementation detail.
> No operations are done to N HWIs, they're done to the number of bits
> in the operands.  Whereas a double_int<N> class does everything to N HWIs.


If that's the only effect then either bitsize or precision is redundant (and
we also have len ...).  Note I did not mention len above, thus the base
class would retain 'len' and double-int would simply use 2 for it
(if you don't template it but make it variable).

Richard.


> Richard

Re: patch to fix constant math - 4th patch - the wide-int class.

Reply via email to