Re: Benchmarking kaffe (Was: Re: SPECjvm98)
On Tuesday 26 March 2002 14:07, Jukka Santala wrote: On Mon, 25 Mar 2002, Dalibor Topic wrote: Sticking to a standard environment would just limit the number of people able to contribute results. That's one of the things I'm afraid of. The last thing we want is people upgrading their compiler/libraries on the run, and forgetting to mention it in the benchmarks, leading to everybody think they've broken something terribly, or found a new optimization. O.K. Writing a script or a java program that compiles --version information for tools and libraries used should be possible. I'm in favor of automating the process as much as possible: 'make benchmark' and you get a bench.txt file at the end with all relevant configuration information and the results. Put a benchmark toolchain definition for that release somewhere where it gets parsed by the benchmark script. Let the script flag non-standard entries. What kind of contribution process would be suitable for such an effort? Emails to a specific mailing list? Web forms? Well, I was most initially thinking of having both a gnuplot graph of the development of the benchmark performance over time, as well as a textual log of the specific results. In the most simplest case, this would only require an e-mail notification of the location of the graphs to this list, and the URL could then be added to the official web-page if deemed useful/reliable enough. If enough data is provided, it might be worth it just to write a script on the web-site machine, that would gather the benchmark logs and collate combined graphs from them. sounds right. But, as implied, if we're aiming for just any benchmark, for posterity and some pretend-comparisions between system perfomances, then all bets are off, and we should probably have some sort of web-form for users to input in that Herez the rezultz I gotz from running my own number-calculation benchmark, calculating how many numbers there are from 1 to 1000 while playing Doom in another Window. This is OBIVIOUSLY what everybody else will be doing with the VM's, so I think this counts. I'm not sure I even have a compiler. ;) Uh, no, thanks :) But that raises an interesting question: which benchmarks would matter? For example, I assume that benchmarking kaffe's as a platform for apache.org's java projects might be interesting, since a couple of responses to the most popular applications thread had those mentioned. What could Ashes cover? dalibore topic _ Do You Yahoo!? Get your free yahoo.com address at http://mail.yahoo.com
Benchmarking kaffe (Was: Re: SPECjvm98)
On Monday 25 March 2002 10:32, Jukka Santala wrote: On Sat, 23 Mar 2002, Dalibor Topic wrote: Do you have any specific benchmarks in mind? Not particularily, but despite the subject-line on this thread, I agree with the view we should probably stay with open-source benchmarks. Ashes looks like a good alternative, and even has Kaffe regression tests section, oddly enough. I assume that's because the people that put together ashes are also developing a byte code optimization tool called soot. They use ashes to evaluate its performance. Wild speculation: they managed to successfully crash different VMs with the resulting bytecodes, thus the regression tests. I was also thinking about the java grande benchmarks, scimark 2.0 and jMocha (I have not checked their licensing status yet). While they are application specific, they are a smaller download than ashes, allowing more people to participate. I think more interesting question is if we should try to agree on a standard runtime environment; the compiler and libraries can have much bigger effect on performance than the JVM/KaffeVM in question. Primarily, this doesn't matter as long as the environment stays the same from test to test, or changes are explictly noted, but to improve comparability of optimizations between platforms etc. there might still be use for agreeing on such. It seems like GCC 2.96 would be the preference, as this is common standard, altough moving to 3.x series should be considered. How about libraries, though? This is a tougher nut to crack, altough admittedly a proper VM benchmark shouldn't depend so much on library performance. I agree that environments don't matter as long one is comparing the results to one's earlier runs. I doubt that comparing one's results with those of others will lead to anything beyond wild speculation. If I recall it correctly, the original motivation was to notice avoid performance regressions on platforms unavailable to the patch developer. Different Linux distributions, for example, ship different version of gcc with their latest offerings. Requiring that everyone compiles kaffe with the version x just puts an additional roadblock before people can evaluate performance. I would rather like to see if there are any regressions on environments that people use, than on a synthetic one. As you point out, there are a lot of external influences to VM performance. I don't feel that we should specify a standard runtime environment, since the standard environment is a moving target on any platform. In my opinion benchmark results tend to become irrelevant after some time anyway. Sticking to a standard environment would just limit the number of people able to contribute results. What kind of contribution process would be suitable for such an effort? Emails to a specific mailing list? Web forms? dalibor topic _ Do You Yahoo!? Get your free yahoo.com address at http://mail.yahoo.com
Re: SPECjvm98
On Friday 22 March 2002 11:27, Jukka Santala wrote: On Thu, 21 Mar 2002, Dalibor Topic wrote: Now you're confusing me, as well. As noted above, modifying code on one platform can affect the performance of other platforms (or other code paths) negatively, and I doubt most developers bother to benchmark even on their own system after every little change that could've slowed things down. Hence it would be good idea to provide performance history graphs on different platforms so that people can see that Aha, so something committed on April the 1st turned the Gibbering test 50% slower on both platforms X and Y, must have been the new gibber-collector I submitted... I agree. I test the stuff I (re)implement for performance, but usually don't bother testing the performance of small spec conformance fixes. I would, for example, like to introduce a kaffe.util.param package in a not too distant future. There is a lot of parameter checking code in the io libraries, and I'd like to factor that out into the param package. The package would contain just final classes with static methods. So your typical method call to BufferedReader.read(buf, off, len) would call kaffe.util.param.Buffer.check(buf, off, len) before it goes on with your read request. Since BufferedReader.read(char [], int, int) already checks its parameters, this would in effect de-inline that code out of the read method. The resulting performance penalty would be eliminated by a JIT compiler that can inline final methods. But it would remain on platforms stuck with an interpreter. The interesting question to ask is: Is the adverse effect on performance on those platforms big enough to kill this idea? I could only know if people submitted benchmark results for those platforms, once the patch is out. In general, if people run kaffe on benchmarks put the results on the web, I assume that Jim will link to them. If people volunteer to do so regularly, That is somewhat what I had in mind. I'm bit reluctant to volunteer, as I don't know how long I can contribute, but I should be able to put up performance-tracking for AMD Duron and StrongARM at least. Intel platforms should be relatively easy to come by. Sounds great. I could offer results on a Cyrix Psomething+. Do you have any specific benchmarks in mind? dalibor topic _ Do You Yahoo!? Get your free yahoo.com address at http://mail.yahoo.com
Re: SPECjvm98
On Tue, 19 Mar 2002, Dalibor Topic wrote: Speaking of benchmarking suites, there is Ashes from the SableVM people. http://www.sable.mcgill.ca/ashes/ I think it's open source, so I'd prefer that. :) Okay, thanks, I'll take a look at that. There are also some open source conformance test-suites, I believe. I like the idea, but that only gives us a rough estimate about performance, not conformance. I think that running a performance benchmark on a web/cvs/ftp/build server under load will produce rough results most of the time. Good catch; no, obiviously you can't run the benchmark on a machine being used for something else at the time and expect to get reasonable results. However, since you'll have to run the benchmark on different architectures anyway, you might as well run them on separate machines that are idle overnight or the like. Also, it does give you a rough idea on conformance if the benchmarks break too badly :) I was thinking more in the terms of breaking some optimization. The problem with optimizations is that they're quite platform specific and sometimes longer to run than simple conformance/ regression tests, so they're difficult for a single developer to run. In addition Kaffe has pretty good conformance, but it would appear performance could use some work. -Jukka Santala
Re: SPECjvm98
On Thursday 21 March 2002 10:24, Jukka Santala wrote: On Tue, 19 Mar 2002, Dalibor Topic wrote: Speaking of benchmarking suites, there is Ashes from the SableVM people. http://www.sable.mcgill.ca/ashes/ I think it's open source, so I'd prefer that. :) Okay, thanks, I'll take a look at that. There are also some open source conformance test-suites, I believe. There is mauve. There is jacks (compiler conformance). Then there are the regression test suites of various implementations, like libgcj's. I have run kaffe with mauve. I haven't looked into other conformance test-suites. Also, it does give you a rough idea on conformance if the benchmarks break too badly :) I was thinking more in the terms of breaking some optimization. The problem with optimizations is that they're quite platform specific and sometimes longer to run than simple conformance/ regression tests, so they're difficult for a single developer to run. In addition Kaffe has pretty good conformance, but it would appear performance could use some work. Optimizations in core library code should be beneficial to all platforms. But I guess you are not talking about those, since they are not platform specific. I don't think I understood the term breaking some optimization properly. Do you mean breaking some benchmark ? If so, yes, meaningful benchmarks will take longer to run than conformance tests in general. You have to run a benchmark for some time to avoid measuring side effects like disk I/O, VM startup etc. Unless of course that's exactly what you want to measure :) Conformance tests usually run for a couple of seconds each. In general, if people run kaffe on benchmarks put the results on the web, I assume that Jim will link to them. If people volunteer to do so regularly, like the nightly builds done by the flex people, someone could write a couple of scripts to collect present the information in a nice way. All it takes is to pick the benchmarks, find volunteers and agree on the procedure :) I could submit results on x86-linux, interpreter/jit/jit3. But that's a rather common platform, and not that interesting, I guess. Unless there is a massive amount of interest in seeing how kaffe performs on old, slow computers :) dalibor topic _ Do You Yahoo!? Get your free yahoo.com address at http://mail.yahoo.com
Re: SPECjvm98
On Tuesday 19 March 2002 10:30, Jukka Santala wrote: On Mon, 18 Mar 2002, Jim Pick wrote: Maybe we can order a copy for kaffe.org and put it on the server? I'm sure TVT will spot the $50 or $100 bucks. Interested in that? Speaking of benchmarking suites, there is Ashes from the SableVM people. http://www.sable.mcgill.ca/ashes/ I think it's open source, so I'd prefer that. :) I think it would be a good idea to have it run on daily autobuild from the CVS, with reports plotted archived on the site, so that problems (As well as improvements) in the codebase could be spotted as early as possible. (Altough, due to the variety of platforms supported, this would always be potentially bit misleading) I like the idea, but that only gives us a rough estimate about performance, not conformance. I think that running a performance benchmark on a web/cvs/ftp/build server under load will produce rough results most of the time. speaking of conformance: the japhar web site features a mauve results page, which would be nice to have in a similar fashion for kaffe. finally, have fun, dalibor topic _ Do You Yahoo!? Get your free yahoo.com address at http://mail.yahoo.com
Re: SPECjvm98
They test kaffe using it here. http://www.shudo.net/jit/perf/ There are some issues (can't load it as an applet, needs a larger heap). Maybe we can order a copy for kaffe.org and put it on the server? I'm sure TVT will spot the $50 or $100 bucks. Interested in that? Cheers, - Jim - Original Message - From: Erik Corry [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, March 18, 2002 12:27 AM Subject: SPECjvm98 Hi Does anyone know whether kaffe can run SPECjvm98? Before I plonk down $50 for a license... -- Erik Corry [EMAIL PROTECTED]