Re: [JS-internals] Reducing SpiderMonkey's crash rate

Jan de Mooij Thu, 28 Apr 2016 08:51:57 -0700

On Thu, Apr 28, 2016 at 5:01 PM, Benjamin Smedberg <[email protected]>
wrote:


> I suggest that we need to start out by reducing the scope of the
> "unknowns". In particular I am scared because we have a *lot* of crashes in
> JIT code and we don't know what kind of crashes these are, in general:
>

That makes sense - I will take a look at a number of JIT code crashes this
week and categorize them manually. We've done this before but W^X and other
things may have changed this a bit.


> Do we do it for all GCed
> objects nowadays, and for JIT code that we believe is done?


We poison JIT code in all builds. I think we have some Nightly-only GC
poisoning.


> We have work to
> do to make the poison value point to inaccessible memory instead of a NOP
> slide (ping me for a bug#). If we can use different poison values for dead
> JIT (executable) memory and dead object (non-executable) memory, that would
> also help distinguish things.


I changed the JIT poison value this week (bug 1267557). It's now a value
that will crash on all platforms. I'll try to get that backported.

Thanks,
Jan

On Thu, Apr 28, 2016 at 2:48 AM, Nicholas Nethercote <[email protected]
> wrote:

> Hi,
>
> Project Uptime (https://wiki.mozilla.org/Platform/Uptime) is underway.
> Its goal is to reduce the crash rate of Firefox (desktop and mobile).
> And SpiderMonkey accounts for a significant fraction of those crashes.
>
> SM provides some particular challenges, in particular the JITs and the GC.
>
> First, JITs and GC are both inherently unsafe things. Lots of raw
> memory manipulation, code manipulation, areas where we have less
> protection than normal C++. (These are things that even Rust wouldn't
> help with much, because we'd have to write big chunks of them in
> unsafe code.)
>
> Second, crash reports from bugs in the JITs and the GC often have less
> info than normal crash reports. For the JITs that's because the stack
> traces are unhelpful -- e.g. so many crashes aggregate under
> EnterBaseline. For the GCs that's because a GC crash often is
> triggered by a buggy code (be it in the GC itself, or elsewhere) that
> ran substantially earlier.
>
> This is a good moment to think hard about how we can improve things.
>
> - Can we use static and dynamic analysis tools more? (Even simple
> things like bug 1267551 can help.)
>
> - How can we get better data in JIT and GC crash reports?
>
> - Would "extended assertions" help? By this I mean verification passes
> over complex data structures. Compilers often have these, e.g. after
> each pass you can optionally run a pass that does a thorough sanity
> check of the IR. Do we have that for the JITs? Would something like
> that make sense for GC? ("Code generators and garbage collectors
> should crash as early and as loudly as possible.")
>
> - What defensive programming measures can we add in? What code
> patterns are error-prone and should be avoided?
>
> - How can we respond to problems? E.g. bug 1232229 as an example where
> a more aggressive approach to backouts would likely have resulted in a
> topcrash diagnosis occurring a lot earlier than it eventually did.
>
> - Could user telemetry be used to identify parts of SM that aren't
> exercised much in Nightly/Aurora/Beta?
>
> I'd love to hear ideas.
>
> Nick
> _______________________________________________
> dev-tech-js-engine-internals mailing list
> [email protected]
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
>
_______________________________________________
dev-tech-js-engine-internals mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
_______________________________________________
dev-tech-js-engine-internals mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Re: [JS-internals] Reducing SpiderMonkey's crash rate

Reply via email to