That sounds great! I’ve tried to follow the steps, but when running raptor-compare I got
$ raptor-compare ./metrics.ldjson [TypeError: undefined is not a function] Is there anything I can attach to make more clear what’s going wrong? Thanks! > On 20 Oct 2015, at 09:04, [email protected] wrote: > > Hi all! > > I've been seeing a lot of people starting using raptor for testing > performance of their patches/code, especially in the context of 2.2 -> 2.5 > regressions. > > That's awesome! > > Now, on top of that, :stas has developed a neat app that helps you get *more* > out of those tests. In particular, it helps you learn if the difference you > see is statistically significant[0]. > > That's important. Not perfect yet, but super important. What it means is that > it answers a question of wherever the change you see can be explained by > fluctuations in results within your test. > > So instead of trying to guess, if the 100ms visuallyLoaded you see between > two test results is real, install raptor-compare and follow the steps below: > > 1) Remove "metrics.ldjson" from the directory you are in > 2) Run your raptor test with as many runs as you can > 3) Apply your change > 4) Run your raptor test with the same amount of runes > 5) raptor-compare ./metrics.ldjson > > zbraniecki@rivia:~$ raptor-compare ./metrics.ldjson > fm.gaiamobile.org base: mean 1: mean 1: delta 1: p-value > --------------------- ---------- ------- -------- ---------- > navigationLoaded 528 524 -4 0.72 > navigationInteractive 738 721 -17 0.77 > visuallyLoaded 738 721 -17 0.77 > contentInteractive 738 722 -17 0.76 > fullyLoaded 923 903 -19 0.59 > rss 29.595 29.412 -0.183 * 0.02 > uss 11.098 11.001 -0.098 * 0.04 > pss 15.050 14.970 -0.080 * 0.03 > > Reading the results - the most important thing is the little asterisk next to > p-value[1]. If p-value is below 5% it suggests that the data observed is not > consistent with the assumption that there are no difference between those two > groups. > > In this example, it states, that there's less than 4% chance, the USS > difference of almost 100kb is random. > At the same time the 20ms difference in fullyLoaded can be totally random. > > If you are getting p-value above 5%, you should reduce your trust in your > results and consider rerunning your tests with more runs. > > Hope that helps! > zb. > > > > [0] https://en.wikipedia.org/wiki/Statistical_significance > [1] https://en.wikipedia.org/wiki/P-value > _______________________________________________ > dev-fxos mailing list > [email protected] > https://lists.mozilla.org/listinfo/dev-fxos _______________________________________________ dev-fxos mailing list [email protected] https://lists.mozilla.org/listinfo/dev-fxos

