On 01/05/2016 08:44 AM, Luke Wagner wrote:
2. In order to detect whether a resource leaks if something throws, you need
to know what conditions are considered as a "leak".  It's trivial to know
that for object allocations, but not for logical resources (for example an
OS handle that you received from a C API, which is an integer as far as the
compiler is concerned.
[...]
For this kind of analysis, we need two things.  One is the callgraph,
similar to the current checker.  The other is knowing whether a function
actually throws, or whether it's only declared as non-noexcept because for
example it calls a non-noexcept API that actually never throws.
I'm not super-familiar with the details, bhackett or the GC team can
fill in more, but I think this is all stuff we're doing now with the
GC rooting analysis.  IIUC, it's based on a pretty awesome C++ static
analysis framework bhackett wrote in grad school called sixgill that's
able to reason about dataflow, the global control flow graph and all
the complicated C++ features.  The basis of reasoning is annotations
on various C++ types that indicate they should not be held live across
calls that could GC and a bunch of other annotations for tweaking the
analysis when we don't want to pay a dynamic cost.  Of course, as you
pointed out, we'd need new annotations on all the resource-y things;
that's the "lots of careful work" that we all agree would be
necessary.  But even before we turned on EH, this could help tighten
the code by instituting a regular, idiomatic scheme for managing
resource-y things in a way that avoids accidental leaks when someone
refactors the code and adds a new return path.

The GC hazard static analysis uses sixgill, but really only the part that produces a simplified version of the control flow graph. Brian implemented some dataflow and general analysis infrastructure in sixgill, but we don't make use of any of that; we just process the CFG with custom JS code. I don't know much about the "built-in" constraint solver that sixgill uses; Brian would have to talk about that. Also, sixgill definitely doesn't handle all C++ features. I've had to teach it about a couple that turned out to be needed for the GC analysis, but there are a number of others that it simply punts on. (Code using those features is just discarded.) Implementing more features is certainly possible, but requires understanding the (undocumented? underdocumented?) GCC internal data structures. Admittedly, many fancy C++ features are handled earlier, so sixgill doesn't need to do anything for them. What we have now is certainly adequate for the GC hazard analysis, but it's something of an open question as to whether it would be adequate for exception safety.

The bigger issue is that, at least the way we're using it now, sixgill doesn't give you anything for dataflow other than a simplified CFG. You'd have to implement that. For intraprocedural stuff, that's probably not too horribly difficult if there are relatively straightforward ways of recognizing (or annotating) values of interest, but it's a lot of busywork.

Also, the call graph is necessarily conservative. Any function pointer is assumed to be able to call anything. Can function pointers be labeled noexcept? Also, while it does (conservatively) handle virtual function calls, currently it accepts the possibility of binary extensions so unless annotated otherwise, it assumes that you might override any virtual method with one that invokes arbitrary code. But maybe we're comfortable disallowing that now.

Oh, and the "annotations on various C++ types" currently take the form of "a hardcoded list of types stored in a JS script", but that's fixed in a version that I still haven't managed to deploy because b2g is giving me trouble for unrelated reasons. Now you can do something like struct MyTaggedPointer { ... } JS_HAZ_GC_POINTER; which boils down to __attribute__((tag("GC Pointer"))). And similar for functions, which introduces the possibility of having __attribute__((tag("throws exceptions, yo"))) and then using the same code to do callgraph traversals for the ones currently tagged __attribute__((tag("GC Call"))). In other words: it is easy to add additional analyses, as long as they want to do pretty much the same thing as the existing GC analysis. I'm sure each analysis will need its own ugly collection of special cases and things. As I said, though, tagging things as "leakable resource" is not at all trivial, due to the lack of dataflow. I don't even know if the dataflow would be precise enough in practice.

_______________________________________________
dev-tech-js-engine-internals mailing list
dev-tech-js-engine-internals@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Reply via email to