Re: [JS-internals] Taint analysis in SpiderMonkey

Nicolas B. Pierron Fri, 09 Aug 2013 18:35:35 -0700

On 08/09/2013 05:27 PM, Jim Blandy wrote:

On 08/09/2013 04:29 PM, Nicolas B. Pierron wrote:

The goal of the tainting is to re-construct the inverted data-flow graph,
i-e finding the origin of a string which flow into a function. And the
data-flow graph is basically what is monitored when we register that we
can see a new values flowing inside a store at a specific code location.


I think that if we want to capture this kind of information, we should at
least make it in such a way that we can also use it to improve our
performance.  If we are able to isolate the data-flow, we could optimize
our data representation based on guarded invariants of the data-flow
(dynamic deforestation?), and with the support of a moving GC, we could
optimize/deoptimize the value representation on GCs.  [to be seen as a JIT
compiler for the data flow instead of only having JIT compilers for the
control flow]


It's true that, in principle, the flow graph the compiler uses and the flow
graph taint analysis uses are the same. But in practice they're very different.

  * Taint is concerned with flow *through* string primitives:
    concatenation, substring, regexp match extraction, and so on. The
    compiler doesn't know much about those operations, and so is only
    concerned with getting them their arguments, and delivering their
    results to the right place. It doesn't relate their inputs to their
    outputs.

This is a problem of instrumentation, and this would still exists even withtainting. Also, as I mentioned to Ivan, monitoring strings is anapproximation, as a string might be given to JSON.parse or converted into anArray/TypedArray.

  * Taint needs to dynamically observe the flow of values in specific
    actual executions. If a particular branch isn't taken, then the
    not-executed code shouldn't affect taint results. But the compiler
    needs to reach conservative conclusions that hold on all possible
    executions.

This would be true in the case of a static compiler, but in the case of adynamic compiler, we can omit information based on the monitored flow. Infact, TI already restrict the possible type to the observed type, and thisis for this precise reason that we need to insert type-barriers inIonMonkey's code, when the set of observed type is not equal to the upperbound calculated by the type inference.

What you propose would require substantial contributions from a group of
engineers (IonMonkey and GC hackers) that is in high demand; it's hard for
me to imagine taint support becoming a sufficient priority for that team -
especially since it's an unproven approach. In contrast, the taint analysis
I brought up here has been prototyped and shown to be valuable, and is
within reach of a volunteer (Ivan) from the security team.

One of the reason why I would prefer us to depend on such information, isthat our focus is set on performances. If a bug or an incorrect valueappear in the analysis, then it would likely be related to a performanceissue or an incorrect behavior. The reasons why I want to find aperformance reason for doing this analysis is that we could rely on it andmake it better.

As a side note. Currently, we conditionally maintained an artificial stacka side to the Interpreter, Baseline and IonMonkey. This stack is only usedby the Gecko profiler. Worse, tbpl does not even run test on the JS Engineto ensure that we get it in a correct shape. So using the informationcollected by this profiling could be helpful in many ways. Such as findingfunctions which are worth keeping across GCs.

Another example, is the type inference. Currently we collect a lot ofinformation which is valuable for the developer tools. Sadly, we do nothave a well-detailed API to make it usable out-side the engine. But thefact that we rely on it, ensure that the type information we see would bebetter than any static analysis tool.


--
Nicolas B. Pierron

_______________________________________________
dev-tech-js-engine-internals mailing list
dev-tech-js-engine-internals@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Re: [JS-internals] Taint analysis in SpiderMonkey

Reply via email to