Re: Current state of performance in Gaia

Andrew Sutherland Tue, 06 Oct 2015 11:37:30 -0700

On Tue, Oct 6, 2015, at 01:05 PM, Fabrice Desré wrote:
> We have tools to detect regressions and report bugs. We have people
> triaging and following up on these bugs. What's left? Locking down devs
> to fix issues? If you have suggestions I'm all ears, but I'm out of
> politically correct ideas.


I think there are two important things we can do:

1) Reduce developer uncertainty about platform impact on app regressions
by providing a single "b2g performance state of the tree" resource that
is always at-hand.  For example, TBPL very nicely is in your face about
the state of the branch you're dealing with (powered by
https://treestatus.mozilla.org/ which one can also directly consult). 
The sheriffs are on top of the tree state and they make sure you know
the tree state too.  It is my impression that our performance team is
likewise on top of things, it's just not as readily reflected to me as a
developer, and it's very easy for me to just assume that the platform
has regressed again.  It would be great to have a green banner on
raptor.mozilla.org that says "The b2g trunk is good; all performance
regressions are your own" or a yellow banner that says "platform is
performanced-busted impacting startup time but not memory usage, check
out bug NNNNNN".

For example, it's my impression that the preallocated process mechanism
was broken for ~2.5 weeks on
https://bugzilla.mozilla.org/show_bug.cgi?id=1204837.  From the bug,
it's clear that raptor and the performance team were on top of this
immediately.  But my hypothetical devil's advocate impression was mainly
that the platform is flakey and I should not bother investigating
performance regressions because they're probably being dealt with by
other people and it's a lot of work for me to figure out the current
state of regressions, etc.

2) Reduce the activation energy to investigating performance regressions
by providing a profiler run for each raptor performance data point. 
This is sorta covered by
https://bugzilla.mozilla.org/show_bug.cgi?id=1192746.  If I am going to
investigate a performance regression, I potentially need to do a device
"context switch" where I ensure it's on trunk state, maybe do a b2g
build to get symbols, maybe have to update my checkout, etc.  It can be
a hassle.  And I'm just going to run the profiler myself anyways.  If
raptor and its automatically filed bugs lets me (a hypothetical
developer) directly click to a profiler run, that makes it much easier
for me to investigate and spot obvious regressions/causes, or just find
things I can improve that aren't actually a regression.  In fact (as a
hypothetical busybody developer) I might even do drive-by profiler-run
analyses on apps that aren't my own.

Of course, the profiler run will distort the performance numbers, so it
needs to not be one of the counted runs.  Note that I'm not suggesting
automated analysis of the profiler runs.  That would be cool, but is
probably two orders of magnitude more work.

Andrew
_______________________________________________
dev-fxos mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-fxos

Re: Current state of performance in Gaia

Reply via email to