On Mon, Jan 7, 2019 at 1:40 AM Pavel Labath <pa...@labath.sk> wrote: > On 04/01/2019 22:19, Jonas Devlieghere via lldb-dev wrote: > > Hi Everyone, > > > > In September I sent out an RFC [1] about adding reproducers to LLDB. > > Over the > > past few months, I landed the reproducer framework, support for the GDB > > remote > > protocol and a bunch of preparatory changes. There's still an open code > > review > > [2] for dealing with files, but that one is currently blocked by a > change to > > the VFS in LLVM [3]. > > > > The next big piece of work is supporting user commands (e.g. in the > > driver) and > > SB API calls. Originally I expected these two things to be separate, but > > Pavel > > made a good case [4] that they're actually very similar. > > > > I created a prototype of how I envision this to work. As usual, we can > > differentiate between capture and replay. > > > > ## SB API Capture > > > > When capturing a reproducer, every SB function/method is instrumented > > using a > > macro at function entry. The added code tracks the function identifier > > (currently we use its name with __PRETTY_FUNCTION__) and its arguments. > > > > It also tracks when a function crosses the boundary between internal and > > external use. For example, when someone (be it the driver, the python > > binding > > or the RPC server) call SBFoo, and in its implementation SBFoo calls > > SBBar, we > > don't need to record SBBar. When invoking SBFoo during replay, it will > > itself > > call SBBar. > > > > When a boundary is crossed, the function name and arguments are > > serialized to a > > file. This is trivial for basic types. For objects, we maintain a table > that > > maps pointer values to indices and serialize the index. > > > > To keep our table consistent, we also need to track return for functions > > that > > return an object by value. We have a separate macro that wraps the > returned > > object. > > > > The index is sufficient because every object that is passed to a > > function has > > crossed the boundary and hence was recorded. During replay (see below) > > we map > > the index to an address again which ensures consistency. > > > > ## SB API Replay > > > > To replay the SB function calls we need a way to invoke the corresponding > > function from its serialized identifier. For every SB function, there's a > > counterpart that deserializes its arguments and invokes the function. > These > > functions are added to the map and are called by the replay logic. > > > > Replaying is just a matter looping over the function identifiers in the > > serialized file, dispatching the right deserialization function, until > > no more > > data is available. > > > > The deserialization function for constructors or functions that return > > by value > > contains additional logic for dealing with the aforementioned indices. > The > > resulting objects are added to a table (similar to the one described > > earlier) > > that maps indices to pointers. Whenever an object is passed as an > > argument, the > > index is used to get the actual object from the table. > > > > ## Tool > > > > Even when using macros, adding the necessary capturing and replay code is > > tedious and scales poorly. For the prototype, we did this by hand, but we > > propose a new clang-based tool to streamline the process. > > > > For the capture code, the tool would validate that the macro matches the > > function signature, suggesting a fixit if the macros are incorrect or > > missing. > > Compared to generating the macros altogether, it has the advantage that > we > > don't have "configured" files that are harder to debug (without faking > line > > numbers etc). > > > > The deserialization code would be fully generated. As shown in the > prototype > > there are a few different cases, depending on whether we have to account > for > > objects or not. > > > > ## Prototype Code > > > > I created a differential [5] on Phabricator with the prototype. It > > contains the > > necessary methods to re-run the gdb remote (reproducer) lit test. > > > > ## Feedback > > > > Before moving forward I'd like to get the community's input. What do you > > think > > about this approach? Do you have concerns or can we be smarter > > somewhere? Any > > feedback would be greatly appreciated! > > > > Thanks, > > Jonas > > > > [1] http://lists.llvm.org/pipermail/lldb-dev/2018-September/014184.html > > [2] https://reviews.llvm.org/D54617 > > [3] https://reviews.llvm.org/D54277 > > [4] https://reviews.llvm.org/D55582 > > [5] https://reviews.llvm.org/D56322 > > > > _______________________________________________ > > lldb-dev mailing list > > lldb-dev@lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev > > > > Thanks for the feedback Pavel!
> [Adding Tamas for his experience with recording and replaying APIs.] > > > Thank you for sharing the prototype Jonas. It looks very interesting, > but there are a couple of things that worry me about it. > > The first one is the usage of __PRETTY_FUNCTION__. That sounds like a > non-starter even for an initial implementation, as the string that > expands to is going to differ between compilers (gcc and clang will > probably agree on it, but I know for a fact it will be different on > msvc). It that was just an internal property of the serialization > format, then it might be fine, but it looks like you are hardcoding the > values in code to connect the methods with their replayers, which is > going to be a problem. > I should've made it clear that I didn't intend to have that checked in. Originally the idea was that we would always use the clang-based tool to generate the IDs (the __PRETTY_FUNCTION__ was just a placeholder for the prototype). I didn't give that much more thought, but if the tool would not generate the ID but just check the macro, that's becomes a valid concern again. > I've been thinking about how could this be done better, and the best > (though not ideal) way I came up with is using the functions address as > the key. That's guaranteed to be unique everywhere. Of course, you > cannot serialize that to a file, but since you already have a central > place where you list all intercepted functions (to register their > replayers), that place can be also used to assign unique integer IDs to > these functions. So then the idea would be that the SB_RECORD macro > takes the address of the current function, that gets converted to an ID > in the lookup table, and the ID gets serialized. > It sound like you would generate the indices at run-time. How would that work with regards to the the reverse mapping? > The part that bugs me about this is that taking the address of an > overloaded function is extremely tedious (you have to write something > like static_cast<function_prototype>(&SBFoo::Bar)). That would mean all > of these things would have to be passed to the RECORD macro. OTOH, the > upshot of this would be that the macro would now have sufficient > information to perform pretty decent error checking on its invocation. > Another nice about this could be that once you already have a prototype > and an address of the function, it should be possible (with sufficient > template-fu) to synthesize replay code for the function automatically, > at least in the simple cases, which would avoid the repetitiveness of > the current replay code. Together, this might obviate the need for any > clang plugins or other funny build steps. > See my previous question. I see how the signature would help with decoding but still missing how you'd get the mapping. > The second thing I noticed is the usage of pointers for identifying > object. A pointer is good for that but only while the object it points > to is alive. Once the object is gone, the pointer can (and most likely > will) be reused. So, it sounds to me like you also need to track the > lifetime of these objects. That may be as simple as intercepting > constructor/destructor calls, but I haven't seen anything like that yet > (though I haven't looked at all details of the patch). > This shouldn't be a problem. When a new object is created it will be recorded in the table with a new identifier. > Tying into that is the recording of return values. It looks like the > current RECORD_RETURN macro will record the address of the temporary > object in the frame of the current function. However, that address will > become invalid as soon as the function returns as the result object will > be copied into a location specified by the caller as a part of the > return processing. Are you handling this in any way? > We capture the temporary and the call to the copy-assignment constructor. This is not super efficient but it's the best we can do. > The final thing, which I noticed is the lack of any sign of threading > support. I'm not too worried about that, as that sounds like something > that could be fitted into the existing framework incrementally, but it > is something worth keeping in mind, as you're going to run into that > pretty soon. > Yup, I've intentially ignored this for now.
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev