I've been doing some view rendering profiling tonight. Here are my
findings. Note these are Java 5 OS X numbers, because it's what I could
get a profiler to attach to. Java 6 will be faster, and I'll show it at
the end.
Starting best time: 0.830000 0.000000 0.830000 ( 0.829000)
Ruby 1.8.6 time: 0.510000 0.020000 0.530000 ( 0.537098)
Ruby 1.8.6 time: 0.500000 0.010000 0.510000 ( 0.508733)
The top entry when profiling is set up to only profile JITed Ruby code
is compile_template? in Rails. This function basically tries to
determine whether it should reload/recompiler a target view template by
doing...you guessed it (Tom), a stat call (specifically lstat). By
adding config.action_view.cache_template_loading = true that drops off
the profile. Note that turning off native code (i.e. JNA, which we use
to support stat) did not appear to have much affect on performance.
Tom gets a "told you so" here, but I always agreed there's probably a
stat happening in the view pipeline.
New best time: 0.714000 0.000000 0.714000 ( 0.714000)
The next item to show up in profiling was rails_asset_id, which
calculates the asset_id tag for a given resource. Again, it's native
calls involved here:
def rails_asset_id(source)
ENV["RAILS_ASSET_ID"] ||
File.mtime("#{RAILS_ROOT}/public/#{source}").to_i.to_s rescue ""
end
I'm not sure why they recalculate this every time, so to get around it
we'll specify a RAILS_ASSET_ID in our environment.
New best time: 0.631000 0.000000 0.631000 ( 0.630000)
The next highest entry in profiling is an unusual one: mon_exit, which
is defined in monitor.rb, which is used exclusively for one purpose:
synchronized logging. WTF! Logging uses monitor synchronization?!? Ok,
easy solution, turn off logging, and prepare to bitchslap whoever
thought that was a good idea (ok, so there are good reasons for it, but
in Rails we've presumably only got one thread logging anyway). In our
case, since we need logging to report numbers, we'll leave it on but
disable the use of monitor. Presumably, since logger uses monitor, we'll
want to wire up our own JRuby-smart version. Have a look at monitor.rb
and fear.
New best time: unchanged, but more consistently close to 0.63
At this point, the largest items in the profile appear to be actual
template rendering code. We'll pause here for now, since we can
benchmark rendering time separately and we've made progress already on
view rendering.
So then I promised Java 6 numbers. Here's numbers with Java 6 and the
changes above, and Ruby 1.8.6 with the RAILS_ASSET_ID env var and
template caching (I didn't monkey with the monitor stuff because monitor
has nearly zero overhead for MRI):
Java 6 time "soylatte"): 0.484000 0.000000 0.484000 ( 0.484000)
Ruby 1.8.6 time: 0.490000 0.000000 0.490000 ( 0.499294)
Yay! We win!
So these really simple tweaks bring us to parity on view rendering. From
here we can only improve, by making sure appropriate code is jitting
well, adding optimizations for bottlenecks we find in the rendering
pipeline, and so on. No worries.
- Charlie
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email