I just set out to do some simple measurements to see how fast libxml may be compared to hpricot.
I made a little script with a ~4 megs XML document appended after __END__. $ uname -s CYGWIN_NT-5.1 $ gem list libxml *** LOCAL GEMS *** libxml-ruby (0.5.2.0) LibXML2 bindings for Ruby $ head -18 ./xml-bm2.rb #!/usr/bin/env ruby require 'benchmark' require 'hpricot' require 'xml/libxml' xml = DATA.read Benchmark.bmbm { |b| b.report('hpricot') do Hpricot::XML(xml).search('data').each{} end b.report('libxml') do XML::Parser.string(xml).parse.find('//data').each{} end } __END__ <?xml version="1.0" encoding="UTF-8" standalone="no"?> $ ./xml-bm2.rb Rehearsal ------------------------------------------- hpricot 14.407000 0.157000 14.564000 ( 15.686000) libxml 0.796000 0.093000 0.889000 ( 42.462000) --------------------------------- total: 15.453000sec user system total real hpricot 13.797000 0.000000 13.797000 ( 15.578000) libxml 0.859000 0.016000 0.875000 ( 41.091000) As you can see, hpricot has finished with parsing the XML previously loaded into memory about three times faster in real time than libxml. Also the other figures for libxml are pretty interesting. To this comes the fact that while hpricot processes the document, my CPU maxes out all the way through - however during the libxml phase, it's virtually idle. Does anyone have any clue as to why this may happen, and how to have libxml live up to its potential?... thx mortee _______________________________________________ libxml-devel mailing list libxml-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/libxml-devel