Dror Matalon wrote:

On Wed, Jan 07, 2004 at 07:24:22PM -0700, Scott Smith wrote:


After two rather frustrating days, I find I need to apologize to Lucene.  My
last run of 225 messages averaged around 25 milliseconds per message--that's
parsing the xml, creating the Document, and putting it in the index (2.5Ghz
cpu, 1G ram).  Turns out the performance problem was xerces sax "helping me"
by loading the DTD before it parsed each message and the DTD wasn't local to
our site.  After seeing Terry's response, I knew there had to be more going
on than what I was assuming.

Thanks for the suggestions. I wonder how much faster I can go if I
implement some of those?



25 msecs to insert a document is on the high side, but it depends of course on the size of your document. You're probably spending 90% of your time in the XML parsing. I believe that there are other parsers that are faster than xerces, you might want to look at these. You might want to look at http://dom4j.org/.

Dror



You may want to check the XML Pull Parser - it offers something between SAX and DOM, with performance similar to SAX. (http://www.extreme.indiana.edu/xgws/xsoap/xpp)

--
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to