At 10.59 09/03/2007 +0100, Michael Schmidt wrote:
Hi,

> At 09.47 09/03/2007 +0100, Alberto Massari wrote:
> >Hi Michael,
> >I recently fixed a performance problem in the SAX2 parser; could you
> >try using the SAX1 parser (via the SAXCount example) just to double check?
>
> Also, you should modify the SAXCount example to add
>
>    parser->useScanner(XMLUni::fgWFXMLScanner);
>
> that installs a scanner that ignores DTD, XMLSchema and any other
> check apart well-formedness.

All right I just did some more benchmarks with the SAX1 parser and the scanner replaced as described above. Indeed, throughputs are significantly higher:

XMARK 10MB   = 31.25
XMARK 100MB  = 34.4827586206897
XMARK 1000MB = 41.101520756268
XMARK 5000MB = 38.13882532418
MEDLINE      = 27.7027027027027
PROTEIN      = 25.2954209748892

(all values are MB/s)

In the former runs (with SAX2), all values varied between 8MB/s and 12MB/s.

Is the bug fixed in the latest nightly build? Would you recommend me to use it for my benchmarks? Can I expect an additional speedup with SAX2? Can I help you by providing additional benchmarks? In principle, for my purposes even the SAX1 benchs would be fine.

Michael,
the fix is in SVN (both 2.7 and 3.0 versions); you can see the difference at http://svn.apache.org/viewvc?view=rev&revision=485700 In my opinion, SAX2 is slower than SAX1, as it has to deal with namespaces; but it would be good if you could place numbers behind this impression. BTW, in order to set the well-formed scanner, you have to call setProperty(XMLUni::fgXercesScannerName, XMLUni::fgWFXMLScanner)

Thanks,
Alberto

Reply via email to