[PATCH] D64146: [Clang Interpreter] Initial patch for the constexpr interpreter

Nandor Licker via Phabricator via cfe-commits Tue, 13 Aug 2019 07:08:17 -0700

nand added a comment.

updated diff, implemented requested changes




================
Comment at: clang/include/clang/Basic/LangOptions.def:291-294
+BENIGN_LANGOPT(EnableClangInterp, 1, 0,
+               "enable the experimental clang interpreter")
+BENIGN_LANGOPT(ForceClangInterp, 1, 0,
+               "force the use of the experimental constexpr interpreter")
----------------
rsmith wrote:
> "Clang" is redundant here (what else could it be?). If we're calling the new 
> thing "interp" internally, then I guess "EnableInterp" and "ForceInterp" seem 
> OK, but given that this is just scaffolding until the new interpreter is 
> done, maybe "EnableNewInterp" and "ForceNewInterp" would be better. 
> Suggestion:
> 
> ```
> BENIGN_LANGOPT(EnableNewInterp, 1, 0,
>                "enable the experimental new constexpr interpreter")
> BENIGN_LANGOPT(ForceNewInterp, 1, 0,
>                "force the use of the experimental new constexpr interpreter")
> ```
The goal is to evaluate everything that is constant - I would like to avoid 
using constexpr specifically in the name.


================
Comment at: clang/lib/AST/ExprConstant.cpp:5039-5040
+      return true;
+    case interp::InterpResult::Fail:
+      return false;
+    case interp::InterpResult::Bail:
----------------
rsmith wrote:
> Missing a check for `ForceClangInterp` here.
This is intended - if the interpreter bails and ForceInterp is set, we emit a 
diagnostic in Context, promoting the error to a failure.


================
Comment at: clang/lib/AST/Interp/ByteCodeGen.cpp:352-381
+    case BO_LAnd: {
+      if (!this->Visit(LHS))
+        return false;
+      auto Short = this->getLabel();
+      if (!this->emitDup(*TypeLHS, BO))
+        return false;
+      if (!this->jumpFalse(Short))
----------------
rsmith wrote:
> Have you considered how you will support Clang's current constant folding of 
> `&&` and `||` expressions where the left-hand side is non-constant but the 
> right hand side is a false or true constant (respectively)?
Same as with statement expressions in the evaluating emitter - some sort of 
closure which compiles and executes when triggered.


================
Comment at: clang/lib/AST/Interp/ByteCodeGen.cpp:536
+        } else {
+          // If the param is a pointer, we can dereference a dummy value.
+          if (PD->getType()->hasPointerRepresentation()) {
----------------
rsmith wrote:
> We can, but should we?
> 
> I would think the only time we can reach this case is when we're 
> constant-evaluating an expression within a function and that expression names 
> a function parameter, and in that case, the expression should be treated as 
> non-constant. We should have a more direct way to represent "if you get here, 
> the evaluation is non-constant, with this reason" in the bytecode.
We absolutely need this to emit the warning:
```
constexpr int callee_ptr(int *beg, int *end) {
  const int x = 2147483647;
  *beg = x + x; // expected-warning {{overflow in expression; result is -2 with 
type 'int'}}\
                // expected-note 3{{value 4294967294 is outside the range of 
representable values of type 'int'}}
  return *beg;
}
```


================
Comment at: clang/lib/AST/Interp/ByteCodeGen.cpp:546
+      }
+      if (auto *VD = dyn_cast<VarDecl>(DE->getDecl())) {
+        auto It = Locals.find(VD);
----------------
rsmith wrote:
> The above `ParmVarDecl` case can fall through into here, because a 
> `ParmVarDecl` is a `VarDecl`. Is that what you intended?
> 
> Do you actually need to treat locals and parameters separately? This would 
> seem simpler if you could model them as being the same thing.
Refactored + fixed this by moving stuff to separate functions.


================
Comment at: clang/lib/AST/Interp/ByteCodeGen.cpp:647-662
+  case PT_Sint8:
+    return this->emitConstSint8(Value.getSExtValue(), E);
+  case PT_Uint8:
+    return this->emitConstUint8(Value.getZExtValue(), E);
+  case PT_Sint16:
+    return this->emitConstSint16(Value.getSExtValue(), E);
+  case PT_Uint16:
----------------
rsmith wrote:
> I'd like to see this generated from a `.def` file, rather than enumerating 
> the various currently-supported integer types in many different places. The 
> goal should be that adding support for a new integer type / size (say we 
> start supporting a target with 48 bit integers) should require making changes 
> in only one place. Case in point: you're currently missing support for 
> 128-bit integers, which Clang does support across many targets.
This is the only case where such a switch occurs - there is another for zero 
initialisation, but it's quite different. A def which would capture the little 
common functionality would be quite ugly, so I would like to avoid it.

As for other integers, there won't be support for any. There are two more 
branches for the fixed-point catch-all fallback.


================
Comment at: clang/lib/AST/Interp/ByteCodeGen.h:40
+class ByteCodeGen : public ConstStmtVisitor<ByteCodeGen<Emitter>, bool>,
+                 public Emitter {
+  using NullaryFn = bool (ByteCodeGen::*)(const SourceInfo &);
----------------
rsmith wrote:
> Does `Emitter` need to be a base class rather than a member?
The entry point is through the emitter which needs to call into the code 
generator, things don't really work if the emitter is a field.


================
Comment at: clang/lib/AST/Interp/EvalEmitter.cpp:71-75
+bool EvalEmitter::bail(const SourceLocation &Loc) {
+  if (!BailLocation)
+    BailLocation = Loc;
+  return false;
+}
----------------
rsmith wrote:
> It seems like a problem that an unsupported construct in a `!isActive()` 
> context causes us to fail evaluation.
Currently the interpreter cannot work side-by-side with the interpreter since 
reading from APValue is not supported at all... Since I am unsure whether I 
ever want to implement an APValue -> Interpreter mapping, I do not mind this, 
as long as diagnostic pinpoint the next feature to be implemented.


================
Comment at: clang/lib/AST/Interp/EvalEmitter.cpp:105-111
+template <PrimType OpType> bool EvalEmitter::emitRet(const SourceInfo &Info) {
+  if (!isActive()) {
+    return true;
+  }
+  using T = typename PrimConv<OpType>::T;
+  return ReturnValue<T>(S.Stk.pop<T>(), Result);
+}
----------------
rsmith wrote:
> Can this happen? I would expect not: if we can't cope with backwards jumps 
> here, I wouldn't expect us to be able to cope with function calls / returns.
> 
> If this can happen, we need to skip anything that happens after the `return` 
> rather than keeping on evaluating it.
Theoretically not, but it helped me catch a few bugs in the past, so I would 
like to keep it conditional.


================
Comment at: clang/lib/AST/Interp/EvalEmitter.h:69
+  bool jump(const LabelTy &Label);
+  bool fallthrough(const LabelTy &Label);
+
----------------
rsmith wrote:
> I can't find any calls to this, and I'm not sure what it's for. Is it 
> necessary?
The conditional operator will require it to handle the fallthrough from the 
false branch to the following expression.


================
Comment at: clang/lib/AST/Interp/EvalEmitter.h:112-114
+  /// Since expressions can only jump forward, predicated execution is
+  /// used to deal with if-else statements.
+  bool isActive() { return CurrentLabel == ActiveLabel; }
----------------
rsmith wrote:
> I'm curious how much this approach costs us, compared to allowing the emitter 
> to tell `ByteCodeGen` whether to emit both arms of a `?:` (or if not, telling 
> it which one to follow) and similarly for `&&` / `||`. There's the constant 
> overhead of checking `isActive()` on each action, plus the cost of the calls 
> into `EvalEmitter` while "evaluating" the unselected operand. (Eg, `false ? 
> some-very-large-expression : 0` would be evaluated much faster if we could 
> tell `ByteCodeGen` not to bother with the unselected operand.)
> 
> (This approach doesn't support statement-expressions, because there can be 
> backwards jumps in those, so I think `?:`/`&&`/`||` are pretty much the only 
> things this supports, right?)
I'd like to revisit this at a later point in time - for now I think predication 
is the simplest option and I would like to stick with the simple options until 
the interpreter is feature-complete.

Statement expressions would not be predicated, the current plan is to generate 
a closure-like method for them.
Another idea would be to revert to bytecode generation whenever the landing pad 
of a potentially backwards jump is encountered - these spots are quite well 
defined.



================
Comment at: clang/lib/AST/Interp/Interp.h:213-222
+  if (!Pointer::isComparable(LHS, RHS)) {
+    const auto &Loc = S.Current->getSource(OpPC);
+    S.FFDiag(Loc, diag::note_invalid_subexpr_in_const_expr);
+    return false;
+  } else {
+    auto VL = LHS.getOffset();
+    auto VR = RHS.getOffset();
----------------
rsmith wrote:
> This is missing lots of necessary checks; I think your chosen model for 
> pointers makes these checks quite hard to implement. The validity of pointer 
> comparisons in constant evaluation depends on the point of divergence of the 
> two access paths. For example:
> 
> ```
> struct A { int n; };
> struct B : A { int m; } b;
> B ba[10];
> constexpr bool x = ba[0].n < &ba[0].m; // error, not constant, divergence 
> point is base class
> constexpr bool y = ba[0].n < &ba[1].m; // ok, divergence point is array 
> indexing
> ```
> 
> Also, access control must be taken into account:
> 
> ```
> struct A {
>   int x;
> private:
>   int y;
>   constexpr bool q(A a) { return &a.x < &a.y; } // not constant, x and y have 
> different access
> };
> ```
I don't think there's a zero-cost way around this. We'll need to track more 
metadata to check whether fields are public/private and traverse the pointer 
chain to find divergence points. The InlineDescriptors might have a few spare 
bits could be used to cache "tags" which classify pointers into comparable 
classes? At this point, it is quite hard to decide what case to make fast at 
what cost.


================
Comment at: clang/lib/AST/Interp/Interp.h:218-219
+  } else {
+    auto VL = LHS.getOffset();
+    auto VR = RHS.getOffset();
+    S.Stk.push<BoolT>(BoolT::from(Fn(Compare(VL, VR))));
----------------
rsmith wrote:
> On what basis do you assume that the orders of the offsets in the 
> compile-time representation has any relation to the correct order for a 
> pointer comparison? Do you intend to guarantee this when doing struct layout 
> for the interpreter? If so, please include a comment here explaining that.
Yes, the layout of fields in the interpreter is supposed to match the target's 
record layout.


================
Comment at: clang/lib/AST/Interp/InterpFrame.cpp:25-36
+  if (Func) {
+    if (auto FrameSize = Func->getFrameSize()) {
+      Locals = llvm::make_unique<char[]>(FrameSize);
+      for (auto &Scope : Func->scopes()) {
+        for (auto &Local : Scope.locals()) {
+          Block *B = new (localBlock(Local.Offset)) Block(Local.Desc);
+          if (Local.Desc->CtorFn)
----------------
rsmith wrote:
> It seems wasteful to me to do this initialization work when entering the 
> frame rather than when reaching the declaration of the variable in question. 
> Do we need to do it early like this?
> 
> How does this deal with loops? I would have expected that we'd re-initialize 
> a loop scope each time we enter the loop body, so that we don't confuse 
> variables from one iteration with variables from another.
We can optimise this later, avoiding the need to create blocks for local 
primitives which never have pointers taken. If a local does not have a pointer 
generated to it using GetLocalPtr, the block is not needed at all and 
zero-initialisation using memset should do the job.


================
Comment at: clang/lib/AST/Interp/Opcodes.td:1
+//===--- Opcodes.td - Opcode defitions for the constexpr VM -----*- C++ 
-*-===//
+//
----------------
rsmith wrote:
> This still needs to be expanded to actually describe what the opcodes do 
> (that is, their effects on the stack and any side-effects they cause). 
> Something simple like:
> 
> ```
> // [RetVal] -> [], returns RetVal
> def Ret : Opcode {
> ```
> 
> ```
> // [A, B] -> [B - A]
> def Sub : AluRealOpcode;
> ```
> 
> would be enough.
I was a bit afraid of documenting everything and then changing everything, but 
things seem to be quite stable now, so I will add some documentation.


================
Comment at: clang/lib/AST/Interp/Pointer.h:69-78
+  /// Start of the chain of pointers.
+  Pointer *Pointers = nullptr;
+  /// Flag indicating if the block has static storage duration.
+  bool IsStatic = false;
+  /// Flag indicating if the block is an extern.
+  bool IsExtern = false;
+  /// Flag indicating if the pointer is dead.
----------------
rsmith wrote:
> If we're allocating one of these for every storage allocation in the 
> evaluation, packing these bools into the spare bits in the pointers seems 
> worthwhile.
It's definitely on the TODO list - I would like to avoid such optimisations for 
now, until enough features are available to run sizeable benchmarks.


================
Comment at: clang/lib/AST/Interp/Pointer.h:245-249
+  /// Returns the embedded descriptor preceding a field.
+  InlineDescriptor *getInlineDescriptor() const {
+    assert(Base != 0 && "Not a nested pointer");
+    return reinterpret_cast<InlineDescriptor *>(Pointee->data() + Base) - 1;
+  }
----------------
rsmith wrote:
> What is this for? I think we need a design document here describing the 
> intended model for pointers, references, descriptors, and so on -- preferably 
> something that can be included in the internals manual.
I haven't written things up yet since I changed quite a few things over the 
past few weeks. I am wrapping up support for unions and will produce a more 
detailed document afterwards. In the meantime, I have documented this field.


================
Comment at: clang/lib/AST/Interp/Pointer.h:251-261
+  /// The block the pointer is pointing to.
+  Block *Pointee = nullptr;
+  /// Start of the current subfield.
+  unsigned Base = 0;
+  /// Offset into the block.
+  unsigned Offset = 0;
+
----------------
rsmith wrote:
> You will also need to track at least:
> 
>  * Whether this is a one-past-the-end pointer
>  * (Information equivalent to) which union members were accessed on the path 
> to this point
> 
> I think everything else can be reconstructed from the type and offset.
It's on the TODO list - I am aiming for C++11 conformance ASAP and this will be 
required.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64146/new/

https://reviews.llvm.org/D64146



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D64146: [Clang Interpreter] Initial patch for the constexpr interpreter

Reply via email to