FYI:

I've been using libxml in some projects and have been getting into JRuby which gives me access to Java xml libraries from Ruby. I thought people on this list might be interested some simple benchmarking I did a couple of months ago.

I'm hoping to use Hpricot for general XML processing instead of Rexml or Libxml in some projects and I wanted to find out the speeds of different XML parsers in MRI and JRuby.

* I was very impressed by how much faster JRuby is when running in Java 1.6 than in 1.5. In Java 1.6 Hpricot in JRuby was only 10% slower than in MRI.

So far I've only got one test parsing a 100k xml file and counting a certain type of element. I'm planning to add more tests that cover more of the kind of processing I need to do.

This is the test:

Do this 100 times:
  - parse a 100k XML file and count the 466 leaf nodes

The results shown below are the times after a "rehearsal". The times for JRuby are faster when the JVM has been "warmed-up". The rehearsal has no effect on the MRI timings.

Platform and method                             total time
-----------------------------------------------------------
JRuby (Java 1.6.0) jdom_document_builder          0.363
MRI: libxml                                       0.389
JRuby (Java 1.6.0 server) jdom_document_builder   0.412
JRuby (server) jdom_document_builder              0.617
JRuby: jdom_document_builder                      1.451
MRI: hpricot                                      2.056
JRuby (Java 1.6.0 server) hpricot                 2.272
JRuby (Java 1.6.0) hpricot                        2.273
JRuby (server) hpricot                            3.447
JRuby: hpricot                                    6.198
JRuby (Java 1.6.0 server) rexml                   6.251
JRuby (Java 1.6.0) rexml                          6.356
MRI: rexml                                        7.624
JRuby (server) rexml                              9.609
JRuby: rexml                                     12.944

* I'd also like to add tests for Ruby 1.9.

The timings reported here are taken from the second time the 100x loop is run for each platform/library test so the JVM should be warmed up.

Tested on:

  MacBook Pro
  2.33 GHz Intel Core 2 Duo
    4 GB memory
  running MacOS X 10.5.2

  Ruby versions tested:
    MRI:   ruby 1.8.6 (2007-09-24 patchlevel 111) [universal-darwin9.0]
    JRuby: ruby 1.8.6 (2008-03-20 rev 6255) [i386-jruby1.1RC3] on Java 1.5.0_13
JRuby: ruby 1.8.6 (2008-03-20 rev 6255) [i386-jruby1.1RC3] on Java 1.6.0_03 (Soylatte)

  Library versions MRI:
    libxml-ruby 0.5.4
    hpricot 0.6

  Library versions JRuby:
    hpricot 0.6.161

More details are available in the links below:

Benchmark code and data checked into subversion here:
https://svn.concord.org/svn/projects/trunk/common/ruby/xml_benchmarks

Trac:
http://trac.cosmos.concord.org/projects/browser/trunk/common/ruby/xml_benchmarks

* Hpricot uses code created by Ragel, a state machine compiler that can produce C or Java code, for the initial parsing. The Ragel => Java compiler can only produce one style of code generation and it is not the fastest. The style chosen by Hpricot for generating the C code produces a larger executable and is faster.

_______________________________________________
libxml-devel mailing list
libxml-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/libxml-devel

Reply via email to