On Thu, Apr 28, 2016 at 5:01 PM, Benjamin Smedberg <[email protected]> wrote:
> I suggest that we need to start out by reducing the scope of the > "unknowns". In particular I am scared because we have a *lot* of crashes in > JIT code and we don't know what kind of crashes these are, in general: > That makes sense - I will take a look at a number of JIT code crashes this week and categorize them manually. We've done this before but W^X and other things may have changed this a bit. > Do we do it for all GCed > objects nowadays, and for JIT code that we believe is done? We poison JIT code in all builds. I think we have some Nightly-only GC poisoning. > We have work to > do to make the poison value point to inaccessible memory instead of a NOP > slide (ping me for a bug#). If we can use different poison values for dead > JIT (executable) memory and dead object (non-executable) memory, that would > also help distinguish things. I changed the JIT poison value this week (bug 1267557). It's now a value that will crash on all platforms. I'll try to get that backported. Thanks, Jan On Thu, Apr 28, 2016 at 2:48 AM, Nicholas Nethercote <[email protected] > wrote: > Hi, > > Project Uptime (https://wiki.mozilla.org/Platform/Uptime) is underway. > Its goal is to reduce the crash rate of Firefox (desktop and mobile). > And SpiderMonkey accounts for a significant fraction of those crashes. > > SM provides some particular challenges, in particular the JITs and the GC. > > First, JITs and GC are both inherently unsafe things. Lots of raw > memory manipulation, code manipulation, areas where we have less > protection than normal C++. (These are things that even Rust wouldn't > help with much, because we'd have to write big chunks of them in > unsafe code.) > > Second, crash reports from bugs in the JITs and the GC often have less > info than normal crash reports. For the JITs that's because the stack > traces are unhelpful -- e.g. so many crashes aggregate under > EnterBaseline. For the GCs that's because a GC crash often is > triggered by a buggy code (be it in the GC itself, or elsewhere) that > ran substantially earlier. > > This is a good moment to think hard about how we can improve things. > > - Can we use static and dynamic analysis tools more? (Even simple > things like bug 1267551 can help.) > > - How can we get better data in JIT and GC crash reports? > > - Would "extended assertions" help? By this I mean verification passes > over complex data structures. Compilers often have these, e.g. after > each pass you can optionally run a pass that does a thorough sanity > check of the IR. Do we have that for the JITs? Would something like > that make sense for GC? ("Code generators and garbage collectors > should crash as early and as loudly as possible.") > > - What defensive programming measures can we add in? What code > patterns are error-prone and should be avoided? > > - How can we respond to problems? E.g. bug 1232229 as an example where > a more aggressive approach to backouts would likely have resulted in a > topcrash diagnosis occurring a lot earlier than it eventually did. > > - Could user telemetry be used to identify parts of SM that aren't > exercised much in Nightly/Aurora/Beta? > > I'd love to hear ideas. > > Nick > _______________________________________________ > dev-tech-js-engine-internals mailing list > [email protected] > https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals > _______________________________________________ dev-tech-js-engine-internals mailing list [email protected] https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals _______________________________________________ dev-tech-js-engine-internals mailing list [email protected] https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

