I just set out to do some simple measurements to see how fast libxml may
be compared to hpricot.

I made a little script with a ~4 megs XML document appended after __END__.

$ uname -s
CYGWIN_NT-5.1
$ gem list libxml

*** LOCAL GEMS ***

libxml-ruby (0.5.2.0)
    LibXML2 bindings for Ruby
$ head -18 ./xml-bm2.rb
#!/usr/bin/env ruby
require 'benchmark'
require 'hpricot'
require 'xml/libxml'

xml = DATA.read

Benchmark.bmbm { |b|
        b.report('hpricot') do
          Hpricot::XML(xml).search('data').each{}
        end
        b.report('libxml') do
          XML::Parser.string(xml).parse.find('//data').each{}
        end
}

__END__
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
$ ./xml-bm2.rb
Rehearsal -------------------------------------------
hpricot  14.407000   0.157000  14.564000 ( 15.686000)
libxml    0.796000   0.093000   0.889000 ( 42.462000)
--------------------------------- total: 15.453000sec

              user     system      total        real
hpricot  13.797000   0.000000  13.797000 ( 15.578000)
libxml    0.859000   0.016000   0.875000 ( 41.091000)

As you can see, hpricot has finished with parsing the XML previously
loaded into memory about three times faster in real time than libxml.
Also the other figures for libxml are pretty interesting. To this comes
the fact that while hpricot processes the document, my CPU maxes out all
the way through - however during the libxml phase, it's virtually idle.

Does anyone have any clue as to why this may happen, and how to have
libxml live up to its potential?...

thx
mortee

_______________________________________________
libxml-devel mailing list
libxml-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/libxml-devel

Reply via email to