That sounds great!

I’ve tried to follow the steps, but when running raptor-compare I got

$ raptor-compare ./metrics.ldjson
[TypeError: undefined is not a function]

Is there anything I can attach to make more clear what’s going wrong?

Thanks!

> On 20 Oct 2015, at 09:04, [email protected] wrote:
> 
> Hi all!
> 
> I've been seeing a lot of people starting using raptor for testing 
> performance of their patches/code, especially in the context of 2.2 -> 2.5 
> regressions.
> 
> That's awesome!
> 
> Now, on top of that, :stas has developed a neat app that helps you get *more* 
> out of those tests. In particular, it helps you learn if the difference you 
> see is statistically significant[0].
> 
> That's important. Not perfect yet, but super important. What it means is that 
> it answers a question of wherever the change you see can be explained by 
> fluctuations in results within your test.
> 
> So instead of trying to guess, if the 100ms visuallyLoaded you see between 
> two test results is real, install raptor-compare and follow the steps below:
> 
> 1) Remove "metrics.ldjson" from the directory you are in
> 2) Run your raptor test with as many runs as you can
> 3) Apply your change
> 4) Run your raptor test with the same amount of runes
> 5) raptor-compare ./metrics.ldjson
> 
> zbraniecki@rivia:~$ raptor-compare ./metrics.ldjson
> fm.gaiamobile.org      base: mean  1: mean  1: delta  1: p-value
> ---------------------  ----------  -------  --------  ----------
> navigationLoaded              528      524        -4        0.72
> navigationInteractive         738      721       -17        0.77
> visuallyLoaded                738      721       -17        0.77
> contentInteractive            738      722       -17        0.76
> fullyLoaded                   923      903       -19        0.59
> rss                        29.595   29.412    -0.183      * 0.02
> uss                        11.098   11.001    -0.098      * 0.04
> pss                        15.050   14.970    -0.080      * 0.03
> 
> Reading the results - the most important thing is the little asterisk next to 
> p-value[1]. If p-value is below 5% it suggests that the data observed is not 
> consistent with the assumption that there are no difference between those two 
> groups.
> 
> In this example, it states, that there's less than 4% chance, the USS 
> difference of almost 100kb is random.
> At the same time the 20ms difference in fullyLoaded can be totally random.
> 
> If you are getting p-value above 5%, you should reduce your trust in your 
> results and consider rerunning your tests with more runs.
> 
> Hope that helps!
> zb.
> 
> 
> 
> [0] https://en.wikipedia.org/wiki/Statistical_significance
> [1] https://en.wikipedia.org/wiki/P-value
> _______________________________________________
> dev-fxos mailing list
> [email protected]
> https://lists.mozilla.org/listinfo/dev-fxos

_______________________________________________
dev-fxos mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-fxos

Reply via email to