Hi! To support our heap analysis developer tools, and for many other hacks, we'd like to have a mode in which we record the JavaScript call stack under which each object is allocated. Our lovely object metadata API should work nicely for this. However, I'm expecting that doing a stack walk every time we allocate an object will have significant performance impact; after we've done our first cut simply using good old ScriptFrameIter, we'll want to look at optimizations.

In keeping with established software development tradition regarding performance, I would like to discuss a possible optimization before I have obtained any actual data about how much time the obvious solution takes. I'd love to hear that this seems infeasible or not valuable, because that would save us work. :)


First optimization: AbstractCodePtr

When we're running bytecode, looking up the corresponding source location entails parsing source notes until we have reached the desired bytecode location, by which time the source notes have told us the current line and column number. So this lookup is linear in the length of the source notes.

Similarly, when we're running IonMonkey code, finding the corresponding source location entails looking up the OsiIndex for the given return address, and then (I gather) consulting the snapshot for more details. And that work yields a JSScript and bytecode offset, which must be looked up in the source notes, as above.

But for profiling - and, I suspect, many other uses - this work is usually wasted: we often capture stacks that we don't print. So we should put off all these lookups as long as possible.

It seemed to me that we could minimize the actual lookups by representing code positions using a type that was quick to construct, and put off doing the lookups until asked. This AbstractCodePtr class could store an <IonScript, displacement> pair, or a <JSScript, bytecode offset> pair, or an actual <URL, line, column> - and mutate itself from lazier to more reified forms on demand.

If each compartment stored a HashMap from scripts to sets of AbstractCodePtrs in that script, then the destructors for IonScripts and JSScripts could do a just-in-time de-lazification, so that holding an AbstractCodePtr needn't force the underlying IonScript or JSScript to be held alive as well. An AbstractCodePtr would simply delazify itself as needed to allow its referent to die first.


Second optimization: recognizing stack prefixes we've already unwound

When unwinding the stack for profiling, the frames at the older end of the stack are going to get walked over and over. It would be helpful if we could have a bit available on stack frames that is initially clear, but which we can set to indicate that we have cached the rest of the stack somewhere. js::StackFrame already has a flags field whose upper bits are zeroed. In IonFrames, a bit in the descriptor would work for this, if one is available; pushing descriptors with an extra zero bit next to the constructing bit should have no runtime cost.
_______________________________________________
dev-tech-js-engine-internals mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Reply via email to