Re: patch to fix constant math - 4th patch - the wide-int class - patch ping for the next stage 1

Richard Biener Mon, 08 Apr 2013 03:32:48 -0700

On Fri, Apr 5, 2013 at 2:34 PM, Kenneth Zadeck <zad...@naturalbridge.com> wrote:
> Richard,
>
> There has been something that has bothered me about you proposal for the
> storage manager and i think i can now characterize that problem.  Say i want
> to compute the expression
>
> (a + b) / c
>
> converting from tree values, using wide-int as the engine and then storing
> the result in a tree.   (A very common operation for the various simplifiers
> in gcc.)
>
> in my version of wide-int where there is only the stack allocated fix size
> allocation for the data, the compiler arranges for 6 instances of wide-int
> that are "statically" allocated on the stack when the function is entered.
> There would be 3 copies of the precision and data to get things started and
> one allocation variable sized object at the end when the INT_CST is built
> and one copy to put it back.   As i have argued, these copies are of
> negligible size.
>
> In your world, to get things started, you would do 3 pointer copies to get
> the values out of the tree to set the expression leaves but then you will
> call the allocator 3 times to get space to hold the intermediate nodes
> before you get to pointer copy the result back into the result cst which
> still needs an allocation to build it. I am assuming that we can play the
> same game at the tree level that we do at the rtl level where we do 1
> variable sized allocation to get the entire INT_CST rather than doing 1
> fixed sized allocation and 1 variable sized one.
>
> even if we take the simpler example of a + b, you still loose.   The cost of
> the extra allocation and it's subsequent recovery is more than my copies.
> In fact, even in the simplest case of someone going from a HWI thru wide_int
> into tree, you have 2 allocations vs my 1.


Just to clarify, my code wouldn't handle

  tree a, b, c;
  tree res = (a + b) / c;

transparently.  The most "complex" form of the above that I think would
be reasonable to handle would be

  tree a, b, c;
  wide_int wires = (wi (a) + b) / c;
  tree res = build_int_cst (TREE_TYPE (a), wires);

and the code as posted would even require you to specify the
return type of operator+ and operator/ explicitely like

 wide_int<> wires = (wi (a).operator+<wi_embed_var>
(b)).operator/<wi_embed_var> (c);

but as I said I just didn't bother to "decide" that the return type is
always of wide_int <variable-len-storage> kind.

Now, the only real allocation that happens is done by build_int_cst.
There is one wide_int on the stack to hold the a + b result and one
separate wide_int to hold wires (it's literally written in the code).
There are no "pointer copies" involved in the end - the result from
converting a tree to a wide_int<tree-storage> is the original 'tree'
pointer itself, thus a register.

> I just do not see the cost savings and if there are no cost savings, you
> certainly cannot say that having these templates is simpler than not having
> the templates.

I think you are missing the point - by abstracting away the storage
you don't necessarily need to add the templates.  But you open up
a very easy route for doing so and you make the operations _trivially_
work on the tree / RTL storage with no overhead in generated code
and minimal overhead in the amount of code in GCC itself.  In my
prototype the overhead of adding 'tree' support is to place

class wi_tree_int_cst
{
  tree cst;
public:
  void construct (tree c) { cst = c; }
  const HOST_WIDE_INT *storage() const { return reinterpret_cast
<HOST_WIDE_INT *>(&TREE_INT_CST (cst)); }
  unsigned len() const { return 2; }
};

template <>
class wi_traits <tree>
{
public:
    typedef wide_int <wi_tree_int_cst> wi_t;
    wi_traits(tree t)
  {
    wi_tree_int_cst ws;
    ws.construct (t);
    w.construct (ws);
  }
    wi_t* operator->() { return &w; }
private:
    wi_t w;
};

into tree.h.

Richard.

> Kenny
>
>
> On 04/02/2013 11:04 AM, Richard Biener wrote:
>>
>> On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck
>> <zad...@naturalbridge.com> wrote:
>>>
>>> This patch contains a large number of the changes requested by Richi.
>>> It
>>> does not contain any of the changes that he requested to abstract the
>>> storage layer.   That suggestion appears to be quite unworkable.
>>
>> I of course took this claim as a challenge ... with the following result.
>> It is
>> of course quite workable ;)
>>
>> The attached patch implements the core wide-int class and three storage
>> models (fixed size for things like plain HWI and double-int, variable size
>> similar to how your wide-int works and an adaptor for the double-int as
>> contained in trees).  With that you can now do
>>
>> HOST_WIDE_INT
>> wi_test (tree x)
>> {
>>    // template argument deduction doesn't do the magic we want it to do
>>    // to make this kind of implicit conversions work
>>    // overload resolution considers this kind of conversions so we
>>    // need some magic that combines both ... but seeding the overload
>>    // set with some instantiations doesn't seem to be possible :/
>>    // wide_int<> w = x + 1;
>>    wide_int<> w;
>>    w += x;
>>    w += 1;
>>    // template argument deduction doesn't deduce the return value type,
>>    // not considering the template default argument either ...
>>    // w = wi (x) + 1;
>>    // we could support this by providing rvalue-to-lvalue promotion
>>    // via a traits class?
>>    // otoh it would lead to sub-optimal code anyway so we should
>>    // make the result available as reference parameter and only support
>>    // wide_int <> res; add (res, x, 1); ?
>>    w = wi (x).operator+<wide_int<> >(1);
>>    wide_int<>::add(w, x, 1);
>>    return w.to_hwi ();
>> }
>>
>> we are somewhat limited with C++ unless we want to get really fancy.
>> Eventually providing operator+ just doesn't make much sense for
>> generic wide-int combinations (though then the issue is its operands
>> are no longer commutative which I think is the case with your wide-int
>> or double-int as well - they don't suport "1 + wide_int" for obvious
>> reasons).
>>
>> So there are implementation design choices left undecided.
>>
>> Oh, and the operation implementations are crap (they compute nonsense).
>>
>> But you should get the idea.
>>
>> Richard.
>
>

Re: patch to fix constant math - 4th patch - the wide-int class - patch ping for the next stage 1

Reply via email to