Bryan Stearns wrote: > where we focus on those problems. However, I don't think > performance-regression bugs targeting a particular commit are useful or > positive for me as a developer: > > - Performance analysis works better when all the changes affecting a > metric are in place: we can analyze the whole chain, not just one piece > at a time.
This is where we have differing opinions. I think it is easier to look at the change in understanding why performance changed, rather than start from the whole chain analysis. > - It's not like we can just back out the commit and discard the feature > requirement. Most of the time, it's not possible to implement new > features without some performance cost. I have background in a project where performance regressions were considered extremely serious, and many a feature went back to drawing board because the initial implementation was not performant enough. I understand we are not even in a position to attempt that with Chandler yet, because there are too many critical features missing, and doing it the hard way can slow feature development significantly. I also understand that in some cases performance costs cannot be avoided. But having said that, AFAIK I have not asked any checkin to be backed out because of a performance regression. > - It's *really* demoralizing to work hard on a feature, then have a > performance-regression bug filed against it a day or so later (usually > after you've started to dig into the next feature). I am sorry, that has certainly not been the intent of me filing the bugs. I believe the wording I have used was along these lines: "please check if the checkin contained obvious performance bugs, and if so, fix them, otherwise mark as invalid". I did not mean to criticize the feature or the implementation. > I also have trouble with our performance-monitoring mechanisms: many of > the measurements vary widely, even when run with the same version of the > code: here's 19 runs against a single revision on a single platform, and > the standard deviation is 1/3 of the average time! > > http://builds.osafoundation.org/perf_data/detail_20070415.html#creating_a_new_event_in_the_cal_view_after_large_data_import.double_click_in_the_calendar_view This is a problem. John's script record/playback could potentially stabilize some results, but it has been quite fragile and does not yet have the functionality to replace all the tests. In some cases running the test with no indexing may get results with less deviation. However, I am slightly against doing this on Tinderbox because the user will run with the indexer. As a developer you can of course make that change locally if it stabilizes the test results. Also, if the test framework gets in the way (like it does in many cases), we as developers can and should modify our local code and insert the profiler calls ourselves where appropriate. > That's a weekend day: on a weekday, or a slower platform, there may be > only one perf run of each revision (or none at all). Because of this, > the graphs cover too short a period to reliably see the effect of a > single commit. This is also a problem. One thing I have been hoping is to make the tests report the last 24 hour period instead of today. However, the reports were originally coded with just day reporting in mind and it will be a fair amount of work to change; so far other tasks have seemed more important. Also, it takes a long time to run the performance tests. And I would like to run each test 5 times instead of the current 3, to get a bit more stability that way. We could get faster hardware, but we'd still need to run the tests on the reference platforms as well. > (While we're on the subject: I also don't like the way we state our the > performance targets: If we say that 1 second is "acceptable", but the > "goal" is .1 seconds, I'm going to stop looking at a problem once it > reaches "acceptable" and switch to another problem, and won't try to > improve the first one further until all the other metrics are at the > "acceptable" level -- and probably not until after all other bugs are > fixed, too, which hasn't happened yet. I'd be happier if the table on I believe that is as it is supposed to be working. > the tbox page used a shade of green once a measurement got to > "acceptable", and a brighter shade if it got to the "goal": the table > would make us look a lot less screwed than the red and orange mess there > now, which makes it look like we're making no progress at all.) Personally I prefer the orange, since it is an improvement from red, and orange still means we are kind of screwed but ok for Preview. Showing green would seem a bit like lying to me. Of course, if the majority want to change the colors I can do that. If not, with many browsers you can use a user.css stylesheet and color the entries anything you like: http://www.squarefree.com/userstyles/ -- Heikki Toivonen
signature.asc
Description: OpenPGP digital signature
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Open Source Applications Foundation "chandler-dev" mailing list http://lists.osafoundation.org/mailman/listinfo/chandler-dev
