[jruby-dev] Severe REXML bottleneck

Charles Oliver Nutter Wed, 17 Oct 2007 18:22:54 -0700

Also blogged here:

http://headius.blogspot.com/2007/10/another-performance-discovery-rexml.html


I've discovered a really awful bottleneck in REXML processing.

Look at these results for parsing our build.xml:

read content from stream, no DOM
  2.592000   0.000000   2.592000 (  2.592000)
  1.326000   0.000000   1.326000 (  1.326000)
  0.853000   0.000000   0.853000 (  0.853000)
  0.620000   0.000000   0.620000 (  0.620000)
  0.471000   0.000000   0.471000 (  0.471000)
read content once, no DOM
  5.323000   0.000000   5.323000 (  5.323000)
  5.328000   0.000000   5.328000 (  5.328000)
  5.209000   0.000000   5.209000 (  5.209000)
  5.173000   0.000000   5.173000 (  5.173000)
  5.138000   0.000000   5.138000 (  5.138000)

When reading from a stream, the content is read in in chunks, with eachchunk being matched (and therefore encoded/decoded) in turn.

However, when a fully-read string is used in memory, matching proceedsas follows:


1. set buffer to entire string
2. match against the buffer
3. set buffer to post match

Now this is obviously a little inefficient, but copy-on-write Stringhelps a lot. However in our case this means that we encode/decode theentire XML content for every element match. For any nontrivial file,this is *terrible* overhead.

So what's the fix? Here's the same second benchmark using a StringIOobject passed to the parser.


read content once, no DOM
  0.640000   0.000000   0.640000 (  0.640000)
  0.693000   0.000000   0.693000 (  0.693000)
  0.542000   0.000000   0.542000 (  0.542000)
  0.349000   0.000000   0.349000 (  0.349000)
  0.336000   0.000000   0.336000 (  0.336000)

This is a perfect indication why JRuby's Rails performance is nowherenear where it could be. Of course the original code would work fine onceour Oniguruma port is complete, but this is a simple change to make for now.


- Charlie

---------------------------------------------------------------------
To unsubscribe from this list please visit:

   http://xircles.codehaus.org/manage_email

[jruby-dev] Severe REXML bottleneck

Reply via email to