Alexey Verkhovsky wrote:
On 5/22/07, Charles Oliver Nutter <[EMAIL PROTECTED]> wrote:
but if that's one of the primary performance issues for rdoc
Not sure if it will be a big win for RDoc. It may. Generally, making
sure that RDoc internals are JIT-compiled would certainly help. As a
matter of fact, I don't see -any- compiled methods in the profiler
call tree when I run RDoc. So far, I don't quite know why - it
certainly does compile some stuff (i.e., JIT is not completely
disabled).
If you're running with code older than a few days ago, you won't see JIT
compiled methods in -rprofile. I had not yet turned on tracing for JIT
compiled methods. It's on now in trunk, and will be on for RC3.
Like I said, so far there is no big win that can be seen at the
interpreter level. No obvious bottleneck. Other than making sure that
it's all somehow compiled to bytecode.
That's about the determination we've come to, but it still feels like
it's a lot slower than it should be. Most code that runs is no worse
than 2x as slow as Ruby, and in some cases it's faster...even in
interpreted mode. To have RDoc be so slow, even with the JIT turned on,
really says to me something's wrong.
There is an obvious bottleneck at the app code level (where RDoc is
the app). I could just say "let's sit down and rewrite
rdoc/parse_rb.rb and irb/slex.rb in straight Java - this stuff is too
close to the metal for dynamic execution". And it would speed things
up umpteen times.
>
> But that's not an option if we are shooting for comparable interpreter
> performance and complete reuse of MRI standard library.
Yes, it's awful code. The option of rewriting it has come up many times,
but as you say, we don't want to have to rewrite code in Java to get it
to go fast.
At that level, I don't see any big wins available without a drastic
change of direction. JVM, however fast, should be slower than C for
primitive byte-pushing. And that's exactly what's involved in RDoc.
E.g., there are about 4 calls per -micro-second (NB: micro, not milli)
to EvaluationState#evalInternal going on there. This method itself
only takes like 9-10% of CPU time (if you believe a profiler, which at
that level of call rates is a big if).
The hope for EvalState#evalInternal is that it can be replaced with a
bytecode engine in the near future. We have experimented with a
YARV-based machine and it proved to be significantly faster than
straight-up interpretation (though not on trunk right now...it has a few
bugs slowing it down). It's not too surprising to see evalInternal
getting hit so hard, since most code is still being interpreted. But
it's an obvious place to improve, and I think a bytecode engine may be
the best way (aside from compilation to Java bytecode, of course).
- Charlie
---------------------------------------------------------------------
To unsubscribe from this list please visit:
http://xircles.codehaus.org/manage_email