Alexey Verkhovsky wrote:
On 5/22/07, Charles Oliver Nutter <[EMAIL PROTECTED]> wrote:
but if that's one of the primary performance issues for rdoc

Not sure if it will be a big win for RDoc. It may. Generally, making
sure that RDoc internals are JIT-compiled would certainly help. As a
matter of fact, I don't see -any- compiled methods in the profiler
call tree when I run RDoc. So far, I don't quite know why - it
certainly does compile some stuff (i.e., JIT is not completely
disabled).

If you're running with code older than a few days ago, you won't see JIT compiled methods in -rprofile. I had not yet turned on tracing for JIT compiled methods. It's on now in trunk, and will be on for RC3.

Like I said, so far there is no big win that can be seen at the
interpreter level. No obvious bottleneck. Other than making sure that
it's all somehow compiled to bytecode.

That's about the determination we've come to, but it still feels like it's a lot slower than it should be. Most code that runs is no worse than 2x as slow as Ruby, and in some cases it's faster...even in interpreted mode. To have RDoc be so slow, even with the JIT turned on, really says to me something's wrong.

There is an obvious bottleneck at the app code level (where RDoc is
the app). I could just say "let's sit down and rewrite
rdoc/parse_rb.rb and irb/slex.rb in straight Java - this stuff is too
close to the metal for dynamic execution". And it would speed things
up umpteen times.
>
> But that's not an option if we are shooting for comparable interpreter
> performance and complete reuse of MRI standard library.

Yes, it's awful code. The option of rewriting it has come up many times, but as you say, we don't want to have to rewrite code in Java to get it to go fast.

At that level, I don't see any big wins available without a drastic
change of direction. JVM, however fast, should be slower than C for
primitive byte-pushing. And that's exactly what's involved in RDoc.
E.g., there are about 4 calls per -micro-second (NB: micro, not milli)
to EvaluationState#evalInternal going on there. This method itself
only takes like 9-10% of CPU time (if you believe a profiler, which at
that level of call rates is a big if).

The hope for EvalState#evalInternal is that it can be replaced with a bytecode engine in the near future. We have experimented with a YARV-based machine and it proved to be significantly faster than straight-up interpretation (though not on trunk right now...it has a few bugs slowing it down). It's not too surprising to see evalInternal getting hit so hard, since most code is still being interpreted. But it's an obvious place to improve, and I think a bytecode engine may be the best way (aside from compilation to Java bytecode, of course).

- Charlie

---------------------------------------------------------------------
To unsubscribe from this list please visit:

   http://xircles.codehaus.org/manage_email

Reply via email to