nand marked 40 inline comments as done. nand added a comment. I have applied most of the changes you suggested to my HEAD which had significantly more functionality, including a replacement of Opcodes.in with TableGen. Quite a few of your concerns are answered by the features I have added between submitting the patch and now. The interpreter now stands at ~6k LOC.
Should I update the diff with an interpreter trimmed down to the functionality of the current diff (~3k LOC) or post the full interpreter now? What would be easier to review? ================ Comment at: clang/lib/AST/ExprVM/Compiler.cpp:88 + Params.insert({ParamDecl, ParamOffset}); + ParamOffset += align(Size); + Args.push_back(*T); ---------------- rsmith wrote: > What happens if this overflows (due to many large local variables)? Overflow happens here if a function takes more than a few million arguments. I think clang would die before it gets here. ================ Comment at: clang/lib/AST/ExprVM/Compiler.cpp:488-503 +Error Compiler::discard(const Expr *DE) { + switch (DE->getStmtClass()) { + case Stmt::BinaryOperatorClass: + return discardBinaryOperator(static_cast<const BinaryOperator *>(DE)); + default: + if (auto E = Visit(DE)) + return E; ---------------- rsmith wrote: > I'm uncomfortable about this: it looks like you're heading towards > duplicating a large portion of the bytecode generation depending on whether > the result is used. Can we do better somehow? (Eg, in CodeGen, at least for > aggregates, we use the same code but track whether the destination is used or > not.) I'd like to do this in a future diff, once we add support for more constructs which discard their results. ================ Comment at: clang/lib/AST/ExprVM/InterpFrame.h:89-90 + char *Args = nullptr; + /// Fixed, initial storage for known local variables. + std::unique_ptr<char[]> Locals; +}; ---------------- rsmith wrote: > Separate allocation for each frame seems wasteful. Have you considered using > a slab-allocated stack or similar? (Eg, allocate a large block up front and > allocate the locals from that, and allocate more blocks if the first one is > exhausted; you can cache the blocks on the `vm::Context` for reuse with no > extra round-trips to the heap.) Yes, I considered allocating this on the stack - I just don't want to add that complexity to the interpreter yet. I like to avoid changing this until it becomes a noticeable performance problem. ================ Comment at: clang/lib/AST/ExprVM/Pointer.h:30 + +/// Describes a memory block - generated once by the compiler. +struct Descriptor { ---------------- rsmith wrote: > How is this going to work for subobjects? We can't generate the full space of > all possible descriptors, as that would be unmanageable for large arrays of > complicated types. Subobjects will have pointers to descriptors embedded into their data - a pointer inside a block can follow that chain of descriptors up to the root. ================ Comment at: clang/lib/AST/ExprVM/Pointer.h:36-39 + /// Size of the object. + unsigned Size; + /// Size of an element. + unsigned ElemSize; ---------------- rsmith wrote: > Please use a distinct type for representing size-in-the-interpreter, to avoid > confusion and bugs when dealing with code that must manage both compile-time > sizes and runtime sizes. And please use `CharUnits` for representing > sizes-at-runtime. I've added an alias to be used for the sizes of object as determined by the interpreter - CharUnits will be used when interfacing with APValue. ================ Comment at: clang/lib/AST/ExprVM/Pointer.h:40-49 + /// Type of the elements. + PrimType ElemType; + /// Flag indicating if the descriptor is mutable. + bool IsMutable = false; + /// Flag indicating if the object is an array. + bool IsArray = false; + /// Flag indicating if the object is a global. ---------------- rsmith wrote: > Consider packing these 5 members into 4 bytes. I'd like to avoid packing stuff for now, makes it harder to change things later and it's not a performance problem yet. ================ Comment at: clang/lib/AST/ExprVM/Program.h:105 + /// Program bytecode. + std::vector<char> Code; + /// Mapping from bytecode offsets to source locations. ---------------- rsmith wrote: > If `CodePtr` is a plain pointer, using a single flat vector here seems > dangerous; you'll have invalidation issues if you ever trigger creation of a > `Function` while evaluating, which seems like something that will happen in > practice, eg: > > ``` > constexpr int f(); > constexpr int g() { return f(); } > constexpr int f() { return 0; } // or imported from a module at this point > constexpr int k = g(); // evaluation of this call triggers generation of code > for f > ``` This wouldn't have been a problem, but the compilation of default constructors caused issues. Now the compiler can recursively invoke itself. ================ Comment at: clang/lib/AST/ExprVM/Program.h:106-107 + std::vector<char> Code; + /// Mapping from bytecode offsets to source locations. + std::map<unsigned, SourceInfo> Source; +}; ---------------- rsmith wrote: > Using a (presumably very large) `std::map` for this would presumably be very > time- and space-inefficient. Do you expect this to be dense? (Would a flat > array of `SourceInfo` work better?) Replaced this with a dense array + binary search. ================ Comment at: clang/lib/AST/ExprVM/Type.h:51-53 +/// An integer which is as wide as the native pointer size to encode pointers. +constexpr PrimType ET_NATIVE_PTR = + sizeof(size_t) == sizeof(PrimConv<ET_UINT_64>::T) ? ET_UINT_64 : ET_UINT_32; ---------------- rsmith wrote: > What do you expect to use this for? If you need to encode a pointer, would it > make sense to have a distinct primitive type for that? Removed this type. ================ Comment at: clang/lib/AST/ExprVM/Type.h:64-75 +#define TYPE_SWITCH(Expr, B) \ + switch (Expr) { \ + case ET_SINT_8: {using T = PrimConv<ET_SINT_8>::T; do{B;}while(0); break; }\ + case ET_UINT_8: {using T = PrimConv<ET_UINT_8>::T; do{B;}while(0); break; }\ + case ET_SINT_16: {using T = PrimConv<ET_SINT_16>::T; do{B;}while(0); break; }\ + case ET_UINT_16: {using T = PrimConv<ET_UINT_16>::T; do{B;}while(0); break; }\ + case ET_SINT_32: {using T = PrimConv<ET_SINT_32>::T; do{B;}while(0); break; }\ ---------------- rsmith wrote: > Injecting `T` into this seems too implicit to me. Consider instead accepting > the name of a function-like macro and invoking it on the type. Eg: > > ``` > #define RETURN_SIZEOF(T) return sizeof(T); > TYPE_SWITCH(Type, RETURN_SIZEOF); > ``` > > Also please wrap the expansion in `do { ... } while (0)` to require the user > of the macro to provide a semicolon afterwards. I'd rather have a documented implicit than use nested macros... Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D64146/new/ https://reviews.llvm.org/D64146 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits