On 01/05/2016 08:44 AM, Luke Wagner wrote:
2. In order to detect whether a resource leaks if something throws, you need
to know what conditions are considered as a "leak". It's trivial to know
that for object allocations, but not for logical resources (for example an
OS handle that you received from a C API, which is an integer as far as the
compiler is concerned.
[...]
For this kind of analysis, we need two things. One is the callgraph,
similar to the current checker. The other is knowing whether a function
actually throws, or whether it's only declared as non-noexcept because for
example it calls a non-noexcept API that actually never throws.
I'm not super-familiar with the details, bhackett or the GC team can
fill in more, but I think this is all stuff we're doing now with the
GC rooting analysis. IIUC, it's based on a pretty awesome C++ static
analysis framework bhackett wrote in grad school called sixgill that's
able to reason about dataflow, the global control flow graph and all
the complicated C++ features. The basis of reasoning is annotations
on various C++ types that indicate they should not be held live across
calls that could GC and a bunch of other annotations for tweaking the
analysis when we don't want to pay a dynamic cost. Of course, as you
pointed out, we'd need new annotations on all the resource-y things;
that's the "lots of careful work" that we all agree would be
necessary. But even before we turned on EH, this could help tighten
the code by instituting a regular, idiomatic scheme for managing
resource-y things in a way that avoids accidental leaks when someone
refactors the code and adds a new return path.
The GC hazard static analysis uses sixgill, but really only the part
that produces a simplified version of the control flow graph. Brian
implemented some dataflow and general analysis infrastructure in
sixgill, but we don't make use of any of that; we just process the CFG
with custom JS code. I don't know much about the "built-in" constraint
solver that sixgill uses; Brian would have to talk about that. Also,
sixgill definitely doesn't handle all C++ features. I've had to teach it
about a couple that turned out to be needed for the GC analysis, but
there are a number of others that it simply punts on. (Code using those
features is just discarded.) Implementing more features is certainly
possible, but requires understanding the (undocumented?
underdocumented?) GCC internal data structures. Admittedly, many fancy
C++ features are handled earlier, so sixgill doesn't need to do anything
for them. What we have now is certainly adequate for the GC hazard
analysis, but it's something of an open question as to whether it would
be adequate for exception safety.
The bigger issue is that, at least the way we're using it now, sixgill
doesn't give you anything for dataflow other than a simplified CFG.
You'd have to implement that. For intraprocedural stuff, that's probably
not too horribly difficult if there are relatively straightforward ways
of recognizing (or annotating) values of interest, but it's a lot of
busywork.
Also, the call graph is necessarily conservative. Any function pointer
is assumed to be able to call anything. Can function pointers be labeled
noexcept? Also, while it does (conservatively) handle virtual function
calls, currently it accepts the possibility of binary extensions so
unless annotated otherwise, it assumes that you might override any
virtual method with one that invokes arbitrary code. But maybe we're
comfortable disallowing that now.
Oh, and the "annotations on various C++ types" currently take the form
of "a hardcoded list of types stored in a JS script", but that's fixed
in a version that I still haven't managed to deploy because b2g is
giving me trouble for unrelated reasons. Now you can do something like
struct MyTaggedPointer { ... } JS_HAZ_GC_POINTER; which boils down to
__attribute__((tag("GC Pointer"))). And similar for functions, which
introduces the possibility of having __attribute__((tag("throws
exceptions, yo"))) and then using the same code to do callgraph
traversals for the ones currently tagged __attribute__((tag("GC
Call"))). In other words: it is easy to add additional analyses, as long
as they want to do pretty much the same thing as the existing GC
analysis. I'm sure each analysis will need its own ugly collection of
special cases and things. As I said, though, tagging things as "leakable
resource" is not at all trivial, due to the lack of dataflow. I don't
even know if the dataflow would be precise enough in practice.
_______________________________________________
dev-tech-js-engine-internals mailing list
dev-tech-js-engine-internals@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals