At 10.59 09/03/2007 +0100, Michael Schmidt wrote:
Hi,
> At 09.47 09/03/2007 +0100, Alberto Massari wrote:
> >Hi Michael,
> >I recently fixed a performance problem in the SAX2 parser; could you
> >try using the SAX1 parser (via the SAXCount example) just to double check?
>
> Also, you should modify the SAXCount example to add
>
> parser->useScanner(XMLUni::fgWFXMLScanner);
>
> that installs a scanner that ignores DTD, XMLSchema and any other
> check apart well-formedness.
All right I just did some more benchmarks with the SAX1 parser and
the scanner replaced as described above. Indeed, throughputs are
significantly higher:
XMARK 10MB = 31.25
XMARK 100MB = 34.4827586206897
XMARK 1000MB = 41.101520756268
XMARK 5000MB = 38.13882532418
MEDLINE = 27.7027027027027
PROTEIN = 25.2954209748892
(all values are MB/s)
In the former runs (with SAX2), all values varied between 8MB/s and 12MB/s.
Is the bug fixed in the latest nightly build? Would you recommend me
to use it for my benchmarks? Can I expect an additional speedup with
SAX2? Can I help you by providing additional benchmarks? In
principle, for my purposes even the SAX1 benchs would be fine.
Michael,
the fix is in SVN (both 2.7 and 3.0 versions); you can see the
difference at http://svn.apache.org/viewvc?view=rev&revision=485700
In my opinion, SAX2 is slower than SAX1, as it has to deal with
namespaces; but it would be good if you could place numbers behind
this impression.
BTW, in order to set the well-formed scanner, you have to call
setProperty(XMLUni::fgXercesScannerName, XMLUni::fgWFXMLScanner)
Thanks,
Alberto