[libxml-devel] simple benchmarks comparing libxml to alternative ruby xml parsing solutions

Stephen Bannasch Wed, 28 May 2008 18:52:40 -0700

FYI:

I've been using libxml in some projects and have been getting intoJRuby which gives me access to Java xml libraries from Ruby. Ithought people on this list might be interested some simplebenchmarking I did a couple of months ago.

I'm hoping to use Hpricot for general XML processing instead of Rexmlor Libxml in some projects and I wanted to find out the speeds ofdifferent XML parsers in MRI and JRuby.

* I was very impressed by how much faster JRuby is when running inJava 1.6 than in 1.5. In Java 1.6 Hpricot in JRuby was only 10%slower than in MRI.

So far I've only got one test parsing a 100k xml file and counting acertain type of element. I'm planning to add more tests that covermore of the kind of processing I need to do.


This is the test:

Do this 100 times:
  - parse a 100k XML file and count the 466 leaf nodes

The results shown below are the times after a "rehearsal". The timesfor JRuby are faster when the JVM has been "warmed-up". The rehearsalhas no effect on the MRI timings.


Platform and method                             total time
-----------------------------------------------------------
JRuby (Java 1.6.0) jdom_document_builder          0.363
MRI: libxml                                       0.389
JRuby (Java 1.6.0 server) jdom_document_builder   0.412
JRuby (server) jdom_document_builder              0.617
JRuby: jdom_document_builder                      1.451
MRI: hpricot                                      2.056
JRuby (Java 1.6.0 server) hpricot                 2.272
JRuby (Java 1.6.0) hpricot                        2.273
JRuby (server) hpricot                            3.447
JRuby: hpricot                                    6.198
JRuby (Java 1.6.0 server) rexml                   6.251
JRuby (Java 1.6.0) rexml                          6.356
MRI: rexml                                        7.624
JRuby (server) rexml                              9.609
JRuby: rexml                                     12.944

* I'd also like to add tests for Ruby 1.9.

The timings reported here are taken from the second time the 100xloop is run for each platform/library test so the JVM should bewarmed up.


Tested on:

  MacBook Pro
  2.33 GHz Intel Core 2 Duo
    4 GB memory
  running MacOS X 10.5.2

  Ruby versions tested:
    MRI:   ruby 1.8.6 (2007-09-24 patchlevel 111) [universal-darwin9.0]
    JRuby: ruby 1.8.6 (2008-03-20 rev 6255) [i386-jruby1.1RC3] on Java 1.5.0_13

JRuby: ruby 1.8.6 (2008-03-20 rev 6255) [i386-jruby1.1RC3] onJava 1.6.0_03 (Soylatte)


  Library versions MRI:
    libxml-ruby 0.5.4
    hpricot 0.6

  Library versions JRuby:
    hpricot 0.6.161

More details are available in the links below:

Benchmark code and data checked into subversion here:
https://svn.concord.org/svn/projects/trunk/common/ruby/xml_benchmarks

Trac:
http://trac.cosmos.concord.org/projects/browser/trunk/common/ruby/xml_benchmarks

* Hpricot uses code created by Ragel, a state machine compiler thatcan produce C or Java code, for the initial parsing. The Ragel =>Java compiler can only produce one style of code generation and it isnot the fastest. The style chosen by Hpricot for generating the Ccode produces a larger executable and is faster.


_______________________________________________
libxml-devel mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/libxml-devel

[libxml-devel] simple benchmarks comparing libxml to alternative ruby xml parsing solutions

Reply via email to