Hi MIchael,
if you used parser->useScanner(XMLUni::fgWFXMLScanner) on the SAX1, you should use reader->setProperty(XMLUni::fgXercesScannerName, XMLUni::fgWFXMLScanner) in order to have them use the same scanner, and yield comparable results. BTW, the WF in the WFXMLScanner stands for well-formed, in the sense that the scanner is written so that it performs only the checks for well-formedness (no DTD or XMLSchema validation). The other scanners available are DGXMLScanner (for DTD validation only), SGXMLScanner (for XMLSchema validation only) and IGXMLScanner (for both DTD and XMLSchema validation).

Could you add the call and rerun the benchmark?
Thanks,
Alberto

At 16.47 09/03/2007 +0100, Michael Schmidt wrote:
Hi Alberto,

> BTW, in order to set the well-formed scanner, you have to call
> setProperty(XMLUni::fgXercesScannerName, XMLUni::fgWFXMLScanner)

what exactly do you mean by well-formed scanner? What I am looking for is a scanner without validation and wellformed-checks. Thus, in the end I am interested in the time needed for tokenization of the input. Does there exist a scanner like this?

In meantime, I rerun my experiments (with the current SVN version). Here are the results:

Xerces SAX1 Interface:
----------------------
Data size real (s) user (s) sys (s) cpu (%) throughput (MB/s)
--------------------------------------------------------------------------------
XMark      10MB      0.83       0.3        0         38.33     33.33
XMark      100MB     5.02       2.72       0.08      54.66     35.71
XMark      1000MB    46.44      22.41      0.76      49.33     43.15
XMark      5000MB    241.04     116.96     4.03      49.66     41.32
MEDLINE    656MB     32.15      22.55      0.58      71.33     28.36
ProtSeq    685MB     32.79      25.94      0.54      80        25.86

Xerces SAX2 Interface:
----------------------
Data size real (s) user (s) sys (s) cpu (%) throughput (MB/s)
--------------------------------------------------------------------------------
XMark      10MB      0.77       0.43        0.01     59        22.72
XMark      100MB     5.82       4.2         0.08     73.33     23.36
XMark      1000MB    56.51      42.81       0.88     77        22.88
XMark      5000MB    292.28     214.1       4.38     74        22.88
MEDLINE    656MB     54.19      44.54       0.6      83        14.53
ProtSeq    685MB     56.37      45.9        0.59     82        14.73

I did not use "setProperty(XMLUni::fgXercesScannerName, XMLUni::fgWFXMLScanner)" in my experiments. Summarizing the results, SAX1 seems to be by a factor of 2 fastern than SAX2, so the experiments confirm what you expected, given that scanners are comparable?

Kind regards
Michael
_______________________________________________________________________
Viren-Scan für Ihren PC! Jetzt für jeden. Sofort, online und kostenlos.
Gleich testen! http://www.pc-sicherheit.web.de/freescan/?mc=022222

Reply via email to