On 8/8/07, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> I also haven't necessarily said what Ollie has proposed is a bad idea.
>  I have simply said the way he has come up with what he proposed is
> not the way we should go about this.  It may turn out he has come up
> with exactly the representation we want (though I doubt this, for
> various reasons).    The specification given also doesn't even explain
> where/how these operations can occur in GIMPLE, and what they do other
> than "a C++ something something".
>
> Also given that someone already wrote a type-based devirtualizer that
> worked fine, and i don't see how a points-to one is much work, I'd
> like to see more justification for things like PTRMEM_PLUS_EXPR than
> "hey, the C++ FE generates this internally".

OK.  It sounds like I need to go into a lot more detail.  The new
nodes I've proposed aren't actually motivated by the C++ front end,
but rather by a consideration of the semantics dictated by the C++
standard.  Naturally, this gives rise to some similarity, but for
instance, there is no PTRMEM_PLUS_EXPR in the C++ front end, and my
definition of PTRMEM_CST disagrees with the current node of the same
name.

Let's walk through them:


PTRMEM_TYPE

Contains the types of the member (TREE_TYPE) and class
(TYPE_PTRMEM_BASETYPE) of this pointer to member.  This is hopefully
self-explanatory.  In the language of the C++ standard, it is the type
of a "pointer to member of class TYPE_PTRMEM_BASETYPE of type
TREE_TYPE."  This is the type of PTRMEM_CST's, PTRMEM_PLUS_EXPR's, and
various variable types (VAR_DECL, FIELD_DECL, PARM_DECL, etc.).


PTRMEM_CST

The C++ front end already has a PTRMEM_CST node.  However, the
existing node only contains a class (PTRMEM_CST_CLASS) and member
(PTRMEM_CST_MEMBER), and is unable to represent an arbitrary pointer
to member value.  This is especially evident when dealing with
multiple inheritance.  Consider the following example:

  struct B { int f (); };
  struct L : B {};
  struct R : B {};;
  struct D : L, R {};

  int (B::*pb)() = &B::f;
  int (L::*pl)() = pb;
  int (R::*pr)() = pb;
  int (D::*pd[2])() = { pl, pr };

In this case, pd[0] and pd[1] both have the same type and point to the
same member of the same class (B::f), but they point to different base
class instances of B.  To represent this, we need an offset.  Now, one
might argue that rather than a numeric offset, we should point to the
_DECL of the base class subobject, but that breaks down because the
following is also legal:

  struct B {};
  struct D : B { int f (); };

  int (D::*pd)() = &D::f;
  int (B::*pb)() = static_cast<int (B::*)()>(pd);

In this case, pb points to D::f in the derived class.  Since there is
no subobject to point to, we see that a numeric offset representation
is required.

This leads to the definition of PTRMEM_CST which I have adopted.
Since the class type is already provided in its type, we store the
member (TREE_PTRMEM_CST_MEMBER) and numeric offset
(TREE_PTRMEM_CST_OFFSET).  The member is one of NULL (for NULL
pointers to members), a FIELD_DECL (for non-NULL pointers to data
members), or a FUNCTION_DECL (for non-NULL pointers to member
functions).  I've chosen the offset value according to convenience.
For NULL pointers to members, it's irrelevant.  For pointers to data
members, it's the offset of the member relative to the current class
(as determined by any type conversions).  For pointers to member
functions, it's the offset to the this pointer which must be passed to
the function.

PTRMEM_PLUS_EXPR

>From the discussion above, it's clear that type conversions on
pointers to members require adjustments to the offsets (to fields or
this pointers).  We could handle this via CONVERT_EXPRs, but that has
two shortcomings: (1) it requires the middle end to compute offsets to
base class subobjects, and (2) as the first code example above
illustrates, multiple CONVERT_EXPRs cannot be folded together.  To
work around these issues, I've implemented the PTRMEM_PLUS_EXPR.  It's
a binary expression which takes two arguments, a PTRMEM_TYPE object,
and an integral offset expression.  These can be nicely constant
folded, either with other PTRMEM_PLUS_EXPRs or with PTRMEM_CSTs.

There's also an added benefit when dealing with NULL pointers to
members.  Consider the following code:

  struct B { int a; };
  struct L : B {};
  struct R : B {};;
  struct D : L, R {};

  int B::*pb = NULL;
  int L::*pl = pb;
  int R::*pr = pb;
  int D::*pd[2] = { pl, pr };

The C++ standard states that pd[0] == pd[1] since all NULL pointers to
members of the same type compare equal.  However, the current GCC
implementation gets this wrong because the C++ front end implements
pointer to data member via simple addition.  In practice, it needs to
check for NULL first.  However, folding stacked conversions then
requires optimizing code like:

  if (d != NULL_MARKER) d += offset1;
  if (d != NULL_MARKER) d += offset2;
  if (d != NULL_MARKER) d += offset3;

to

  if (d!= NULL_MARKER) d += offset1 + offset2 + offset3;

GCC's optimizer may well be smart enough to do this, but with
PTRMEM_PLUS_EXPRs you get this for free even with optimization
disabled.  We simply fold the stacked PTRMEM_PLUS_EXPRs into a single
expression with an operand of offset1+offset2+offset3 and add the NULL
check as part of the RTL expansion.

PTRMEM_REF

This one's pretty straightforward.  It takes a class expression and a
pointer to member expression and returns a reference to the specified
field or function.  After all, we have to implement the .* and ->*
operators somehow.  :P

This is the source of my current woes, as this may involve virtual
function resolution, which can't be done with the information
currently available to the middle end.

Ollie

Reply via email to