On Mon, Jul 6, 2009 at 10:11 AM, Geoffrey Garen <gga...@apple.com> wrote:
> So, what you end up with is after a couple of years, the slowest test in >> the suite is the most significant part of the score. Further, I'll predict >> that the slowest test will most likely be the least relevant test, because >> the truly important parts of JS engines were already optimized. This has >> happened with Sunspider 0.9 - the regex portions of the test became the >> dominant factor, even though they were not nearly as prominent in the real >> world as they were in the benchmark. This leads to implementors optimizing >> for the benchmark - and that is not what we want to encourage. >> > > How did you determine that regex performance is "not nearly as prominent in > the real world?" > For a while regex was 20-30% of the benchmark on most browsers even though it didn't consume 20-30% of the time that browsers spent inside javascript. So, I determined this through profiling. If you profile your browser while browsing websites, you won't find that it spends 20-30% of its javascript execution time running regex (even with the old pcre). It's more like 1%. If this is true, then it's a shame to see this consume 20-30% of any benchmark, because it means the benchmark scoring is not indicative of the real world. Maybe I just disagree with the mix ever having been very representative? Or maybe it changed over time? I don't know because I can't go back in time :-) Perhaps one solution is to better document how a mix is chosen. I don't really want to make this a debate about regex and he-says/she-says how expensive it is. We should talk about the framework. If the framework is subject to this type of skew, where it can disproportionately weight a test, is that something we should avoid? Keep in mind I'm not recommending any change to existing SunSpider 0.9 - just changes to future versions. Maciej pointed out a case where he thought the geometric mean was worse; I think thats a fair consideration if you have the perfect benchmark with an exactly representative workload. But we don't have the ability make a perfectly representative benchmark workload, and even if we did it would change over time - eventually making the benchmark useless... Mike
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev