Thanks, I'll resubmit my post over there. I just wasn't sure if that mailing list was specific for xerces development, rather than menial troubleshooting problems.
----- Original Message ----- From: "Neil Graham" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, May 25, 2004 11:21 AM Subject: Re: XML performance problems with xerces c++ > > > > > > Hi Nath, > > You really don't want to use this list for such questions; better to use > the Xerces-C-specific list here [*]. > > But here are some thoughts: I don't understand what you mean when you > write "It seems the larger the XML file, the longer it takes to parse > individual nodes." When Xerces returns a DOM document to you, it has > already parsed the entire document; it doesn't go off and parse more of it > as you move down the list of children of the root element. And, if all you > want is information from the children of the root element, you may well > wish to use SAX; the DOM is inherently both processor- and > memory-intensive. > > Cheers, > Neil > > [*]: http://xml.apache.org/mail.html#xerces-c-dev > Neil Graham > XML Parser Development > IBM Toronto Lab > Phone: 905-413-3519, T/L 969-3519 > E-mail: [EMAIL PROTECTED] > > > > > > "Nath" > <[EMAIL PROTECTED] To: <[EMAIL PROTECTED]> > il.com> cc: > Subject: XML performance problems with xerces c++ > 05/24/2004 10:56 > PM > Please respond to > general > > > > > > I converted over a dictionary of words and definitions into XML files (one > file per letter of the alphabet), each weighing around 1-5 megs (I chose > XML > over a DB for important reasons). I'm trying to parse these files and it's > taking an incredible amount of time to do it. When parsing small files > (letters X, Y, and Z - a total of 815 words or 151 KB) the parser can do so > in less than 2 seconds. When parsing the letter A file (40,000 some words > or > 1.58 megs), it takes 5 seconds just to parse 20 words. It seems the larger > the XML file, the longer it takes to parse individual nodes. Can anyone > suggest why this is happening and how I can fix it? I've used xerces c++ > 2.4.0 and recently upgraded to xerces 2.5.0. > > > > I'm just following the standard XML start-up and DOM parsing procedure > > - Initialize platform utils > > - Don't validate files > > - parse and assign DOM document > > - go through each child node and collect data > > > > I have a 1600MHz processor, so handling a few meg files should be fairly > quick. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]