Re: Benchmarking kaffe (Was: Re: SPECjvm98)

2002-03-26 Thread Dalibor Topic


On Tuesday 26 March 2002 14:07, Jukka Santala wrote:
 On Mon, 25 Mar 2002, Dalibor Topic wrote:
  Sticking to a standard environment would just limit the number of
  people able to contribute results.

 That's one of the things I'm afraid of. The last thing we want is people
 upgrading their compiler/libraries on the run, and forgetting to mention
 it in the benchmarks, leading to everybody think they've broken something
 terribly, or found a new optimization.

O.K. Writing a script or a java program that compiles --version information 
for tools and libraries used should be possible. I'm in favor of automating 
the process as much as possible: 'make benchmark' and you get a bench.txt 
file at the end with all relevant configuration information and the results. 
Put a benchmark toolchain definition for that release somewhere where it gets 
parsed by the benchmark script. Let the script flag non-standard entries.

  What kind of contribution process would be suitable for such an
  effort? Emails to a specific mailing list? Web forms?

 Well, I was most initially thinking of having both a gnuplot graph of the
 development of the benchmark performance over time, as well as a textual
 log of the specific results. In the most simplest case, this would only
 require an e-mail notification of the location of the graphs to this list,
 and the URL could then be added to the official web-page if deemed
 useful/reliable enough. If enough data is provided, it might be worth it
 just to write a script on the web-site machine, that would gather the
 benchmark logs and collate combined graphs from them.

sounds right.

 But, as implied, if we're aiming for just any benchmark, for posterity
 and some pretend-comparisions between system perfomances, then all bets
 are off, and we should probably have some sort of web-form for users to
 input in that Herez the rezultz I gotz from running my own
 number-calculation benchmark, calculating how many numbers there are from
 1 to 1000 while playing Doom in another Window. This is OBIVIOUSLY what
 everybody else will be doing with the VM's, so I think this counts. I'm
 not sure I even have a compiler. ;)

Uh, no, thanks :)

But that raises an interesting question: which benchmarks would matter? For 
example, I assume that benchmarking kaffe's as a platform for apache.org's 
java projects might be interesting, since a couple of responses to the most 
popular applications thread had those mentioned. What could Ashes cover?

dalibore topic

_
Do You Yahoo!?
Get your free yahoo.com address at http://mail.yahoo.com




Benchmarking kaffe (Was: Re: SPECjvm98)

2002-03-25 Thread Dalibor Topic


On Monday 25 March 2002 10:32, Jukka Santala wrote:
 On Sat, 23 Mar 2002, Dalibor Topic wrote:
  Do you have any specific benchmarks in mind?

 Not particularily, but despite the subject-line on this thread, I agree
 with the view we should probably stay with open-source benchmarks. Ashes
 looks like a good alternative, and even has Kaffe regression tests
 section, oddly enough.

I assume that's because the people that put together ashes are also developing 
a byte code optimization tool called soot. They use ashes to evaluate its 
performance. Wild speculation: they managed to successfully crash different 
VMs with the resulting bytecodes, thus the regression tests.

I was also thinking about the java grande benchmarks, scimark 2.0 and jMocha 
(I have not checked their licensing status yet). While they are application 
specific, they are a smaller download than ashes, allowing more people to 
participate.

 I think more interesting question is if we should try to agree on a
 standard runtime environment; the compiler and libraries can have much
 bigger effect on performance than the JVM/KaffeVM in question. Primarily,
 this doesn't matter as long as the environment stays the same from test to
 test, or changes are explictly noted, but to improve comparability of
 optimizations between platforms etc. there might still be use for agreeing
 on such. It seems like GCC 2.96 would be the preference, as this is common
 standard, altough moving to 3.x series should be considered. How about
 libraries, though? This is a tougher nut to crack, altough admittedly a
 proper VM benchmark shouldn't depend so much on library performance.

I agree that environments don't matter as long one is comparing the results to 
one's earlier runs. I doubt that comparing one's results with those of others 
will lead to anything beyond wild speculation.

If I recall it correctly, the original motivation was to notice  avoid 
performance regressions on platforms unavailable to the patch developer.  
Different Linux distributions, for example, ship different version of gcc 
with their latest offerings. Requiring that everyone compiles kaffe with the 
version x just puts an additional roadblock before people can evaluate 
performance. I would rather like to see if there are any regressions on 
environments that people use, than on a synthetic one.

 As you point out, there are a lot of external influences to VM performance. I 
don't feel that we should specify a standard runtime environment, since the 
standard environment is a moving target on any platform. In my opinion 
benchmark results tend to become irrelevant after some time anyway. Sticking 
to a standard environment would just limit the number of people able to 
contribute results.

What kind of contribution process would be suitable for such an effort? Emails 
to a specific mailing list? Web forms?

dalibor topic

_
Do You Yahoo!?
Get your free yahoo.com address at http://mail.yahoo.com




Re: SPECjvm98

2002-03-23 Thread Dalibor Topic


On Friday 22 March 2002 11:27, Jukka Santala wrote:
 On Thu, 21 Mar 2002, Dalibor Topic wrote:
 Now you're confusing me, as well. As noted above, modifying code on one
 platform can affect the performance of other platforms (or other code
 paths) negatively, and I doubt most developers bother to benchmark even on
 their own system after every little change that could've slowed things
 down.

 Hence it would be good idea to provide performance history graphs on
 different platforms so that people can see that Aha, so something
 committed on April the 1st turned the Gibbering test 50% slower on both
 platforms X and Y, must have been the new gibber-collector I submitted...

I agree. I test the stuff I (re)implement for performance, but usually don't 
bother testing the performance of small spec conformance fixes.

I would, for example, like to introduce a kaffe.util.param package in a not 
too distant future. There is a lot of parameter checking code in the io 
libraries, and I'd like to factor that out into the param package. The 
package would contain just final classes with static methods. So your typical 
method call to BufferedReader.read(buf, off, len) would call 
kaffe.util.param.Buffer.check(buf, off, len) before it goes on with your read 
request.

Since BufferedReader.read(char [], int, int) already checks its parameters, 
this would in effect de-inline that code out of the read method. The 
resulting performance penalty would be eliminated by a JIT compiler that can 
inline final methods. But it would remain on platforms stuck with an 
interpreter.

The interesting question to ask is: Is the adverse effect on performance on 
those platforms big enough to kill this idea? I could only know if people 
submitted benchmark results for those platforms, once the patch is out.

  In general, if people run kaffe on benchmarks  put the results on the
  web, I assume that Jim will link to them. If people volunteer to do so
  regularly,

 That is somewhat what I had in mind. I'm bit reluctant to volunteer, as I
 don't know how long I can contribute, but I should be able to put up
 performance-tracking for AMD Duron and StrongARM at least. Intel platforms
 should be relatively easy to come by.

Sounds great. I could offer results on a Cyrix Psomething+.

Do you have any specific benchmarks in mind? 

dalibor topic

_
Do You Yahoo!?
Get your free yahoo.com address at http://mail.yahoo.com




Re: SPECjvm98

2002-03-21 Thread Jukka Santala


On Tue, 19 Mar 2002, Dalibor Topic wrote:
 Speaking of benchmarking suites, there is Ashes from the SableVM people.
 http://www.sable.mcgill.ca/ashes/
 I think it's open source, so I'd prefer that. :)

Okay, thanks, I'll take a look at that. There are also some open source
conformance test-suites, I believe.

 I like the idea, but that only gives us a rough estimate about performance, 
 not conformance.  I think that running a performance benchmark on a 
 web/cvs/ftp/build server under load will produce rough results most of the 
 time.

Good catch; no, obiviously you can't run the benchmark on a machine being
used for something else at the time and expect to get reasonable results.
However, since you'll have to run the benchmark on different architectures
anyway, you might as well run them on separate machines that are idle
overnight or the like.

Also, it does give you a rough idea on conformance if the benchmarks break
too badly :) I was thinking more in the terms of breaking some
optimization. The problem with optimizations is that they're quite
platform specific and sometimes longer to run than simple conformance/
regression tests, so they're difficult for a single developer to run. In
addition Kaffe has pretty good conformance, but it would appear
performance could use some work.

 -Jukka Santala




Re: SPECjvm98

2002-03-21 Thread Dalibor Topic


On Thursday 21 March 2002 10:24, Jukka Santala wrote:
 On Tue, 19 Mar 2002, Dalibor Topic wrote:
  Speaking of benchmarking suites, there is Ashes from the SableVM people.
  http://www.sable.mcgill.ca/ashes/
  I think it's open source, so I'd prefer that. :)

 Okay, thanks, I'll take a look at that. There are also some open source
 conformance test-suites, I believe.

There is mauve. There is jacks (compiler conformance). Then there are the 
regression test suites of various implementations, like libgcj's. I have run 
kaffe with mauve. I haven't looked into other conformance test-suites.

 Also, it does give you a rough idea on conformance if the benchmarks break
 too badly :) I was thinking more in the terms of breaking some
 optimization. The problem with optimizations is that they're quite
 platform specific and sometimes longer to run than simple conformance/
 regression tests, so they're difficult for a single developer to run. In
 addition Kaffe has pretty good conformance, but it would appear
 performance could use some work.

Optimizations in core  library code should be beneficial to all platforms. 
But I guess you are not talking about those, since they are not platform 
specific.

I don't think I understood the term breaking some optimization properly. Do 
you mean breaking some benchmark ? If so, yes, meaningful benchmarks will 
take longer to run than conformance tests in general. You have to run a 
benchmark for some time to avoid measuring side effects like disk I/O, VM 
startup etc. Unless of course that's exactly what you want to measure :) 
Conformance tests usually run for a couple of seconds each.

In general, if people run kaffe on benchmarks  put the results on the web, I 
assume that Jim will link to them. If people volunteer to do so regularly, 
like the nightly builds done by the flex people, someone could write a couple 
of scripts to collect  present the information in a nice way. All it takes 
is to pick the benchmarks, find volunteers and agree on the procedure :)

I could submit results on x86-linux, interpreter/jit/jit3. But that's a rather 
common platform, and not that interesting, I guess. Unless there is a massive 
amount of interest in seeing how kaffe performs on old, slow computers :)

dalibor topic



_
Do You Yahoo!?
Get your free yahoo.com address at http://mail.yahoo.com




Re: SPECjvm98

2002-03-19 Thread Dalibor Topic


On Tuesday 19 March 2002 10:30, Jukka Santala wrote:
 On Mon, 18 Mar 2002, Jim Pick wrote:
  Maybe we can order a copy for kaffe.org and put it on the server?  I'm
  sure TVT will spot the $50 or $100 bucks.  Interested in that?

Speaking of benchmarking suites, there is Ashes from the SableVM people.
http://www.sable.mcgill.ca/ashes/
I think it's open source, so I'd prefer that. :)

 I think it would be a good idea to have it run on daily autobuild from the
 CVS, with reports plotted  archived on the site, so that problems (As
 well as improvements) in the codebase could be spotted as early as
 possible. (Altough, due to the variety of platforms supported, this would
 always be potentially bit misleading)

I like the idea, but that only gives us a rough estimate about performance, 
not conformance.  I think that running a performance benchmark on a 
web/cvs/ftp/build server under load will produce rough results most of the 
time.

speaking of conformance: the japhar web site features a mauve results page, 
which would be nice to have in a similar fashion for kaffe.

finally,
have fun,

dalibor topic

_
Do You Yahoo!?
Get your free yahoo.com address at http://mail.yahoo.com




Re: SPECjvm98

2002-03-18 Thread Jim Pick


They test kaffe using it here.

http://www.shudo.net/jit/perf/

There are some issues (can't load it as an applet, needs a larger heap).

Maybe we can order a copy for kaffe.org and put it on the server?  I'm sure
TVT will spot the $50 or $100 bucks.  Interested in that?

Cheers,

 - Jim

- Original Message -
From: Erik Corry [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, March 18, 2002 12:27 AM
Subject: SPECjvm98



 Hi

 Does anyone know whether kaffe can run SPECjvm98?  Before I plonk
 down $50 for a license...

 --
 Erik Corry [EMAIL PROTECTED]