On 08/09/2013 05:27 PM, Jim Blandy wrote:
On 08/09/2013 04:29 PM, Nicolas B. Pierron wrote:
The goal of the tainting is to re-construct the inverted data-flow graph,
i-e finding the origin of a string which flow into a function. And the
data-flow graph is basically what is monitored when we register that we
can see a new values flowing inside a store at a specific code location.

I think that if we want to capture this kind of information, we should at
least make it in such a way that we can also use it to improve our
performance.  If we are able to isolate the data-flow, we could optimize
our data representation based on guarded invariants of the data-flow
(dynamic deforestation?), and with the support of a moving GC, we could
optimize/deoptimize the value representation on GCs.  [to be seen as a JIT
compiler for the data flow instead of only having JIT compilers for the
control flow]

It's true that, in principle, the flow graph the compiler uses and the flow
graph taint analysis uses are the same. But in practice they're very different.

  * Taint is concerned with flow *through* string primitives:
    concatenation, substring, regexp match extraction, and so on. The
    compiler doesn't know much about those operations, and so is only
    concerned with getting them their arguments, and delivering their
    results to the right place. It doesn't relate their inputs to their
    outputs.

This is a problem of instrumentation, and this would still exists even with tainting. Also, as I mentioned to Ivan, monitoring strings is an approximation, as a string might be given to JSON.parse or converted into an Array/TypedArray.

  * Taint needs to dynamically observe the flow of values in specific
    actual executions. If a particular branch isn't taken, then the
    not-executed code shouldn't affect taint results. But the compiler
    needs to reach conservative conclusions that hold on all possible
    executions.

This would be true in the case of a static compiler, but in the case of a dynamic compiler, we can omit information based on the monitored flow. In fact, TI already restrict the possible type to the observed type, and this is for this precise reason that we need to insert type-barriers in IonMonkey's code, when the set of observed type is not equal to the upper bound calculated by the type inference.

What you propose would require substantial contributions from a group of
engineers (IonMonkey and GC hackers) that is in high demand; it's hard for
me to imagine taint support becoming a sufficient priority for that team -
especially since it's an unproven approach. In contrast, the taint analysis
I brought up here has been prototyped and shown to be valuable, and is
within reach of a volunteer (Ivan) from the security team.

One of the reason why I would prefer us to depend on such information, is that our focus is set on performances. If a bug or an incorrect value appear in the analysis, then it would likely be related to a performance issue or an incorrect behavior. The reasons why I want to find a performance reason for doing this analysis is that we could rely on it and make it better.

As a side note. Currently, we conditionally maintained an artificial stack a side to the Interpreter, Baseline and IonMonkey. This stack is only used by the Gecko profiler. Worse, tbpl does not even run test on the JS Engine to ensure that we get it in a correct shape. So using the information collected by this profiling could be helpful in many ways. Such as finding functions which are worth keeping across GCs.

Another example, is the type inference. Currently we collect a lot of information which is valuable for the developer tools. Sadly, we do not have a well-detailed API to make it usable out-side the engine. But the fact that we rely on it, ensure that the type information we see would be better than any static analysis tool.

--
Nicolas B. Pierron

_______________________________________________
dev-tech-js-engine-internals mailing list
dev-tech-js-engine-internals@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Reply via email to