On 10/14/2011 09:32 AM, coffeMan wrote: > I got the solution resolved.....i am parsing over 11,000 different > file types...it is going slow using the DOM Xml Parser...any ideas on > how to improve performance? > > I cannot think of any other way to parse it
Well, first things first: let's clear-up the terminology. I think you mean 11x10^3 different documents, all conforming to the same schema. You only have /one/ document type: xml. Please correct my impression otherwise. Short answer: Form a NodeList of "interesting" leaf nodes, and don't worry about the path from the document root to each leaf. Long answer follows. 11x10^3 different documents is not unusual in a production environment. For example, consider the single DocBook schema, and the count of documents derived from that single schema. Apparently, all you know is that the current document is well-formed. You do not know if it's valid. Some might argue that you do not even know if the document is well-formed, but let's assume the document was produced mechanically, and that all elements, attributes, and PCDATA are well-formed. So, you should only write code that relies on the document's physical structure, not its logical structure. I think that the best you can do is to treat the document as a "flat space". Go directly to the child nodes of interest. There's probably nothing to gain by parsing the document as though it were a tree (which it is, I know...). In other words, given what little I know about your specific problem, I believe you are probably just interested in leaf nodes. So, form a NodeList of those leaf nodes, and don't worry about the path from the document root to each leaf. The leaf nodes in the list will probably have different parents, but I don't think that matters in this instance. Forget my earlier advice about GQuery. It's probably over-kill, given what little I know about the problem you're trying to solve. Bueno Suerte, jec > > On Oct 14, 10:40 am, Jeffrey Chimene <jchim...@gmail.com> wrote: >> On 10/14/2011 7:00 AM, coffeMan wrote: >> >>> I am retrieving XML from a servlet and parsing through it. the xml >>> code is one large XML file but on through the servlet. I can parse >>> through it easy and get my results but i am not sure when to stop it. >>> I keep getting NullPointerException that kicks off when it reaches the >>> end. I never know when its going to end because every file that i >>> parse through is different in length. >> >>> I am using the DOM parser. Document messageDom = >>> XMLParser.parse(srv.getXmlObject()); >> >>> string name = >>> messageDom.getElementsByTagName("name").item(n).getFirstChild().getNodeValu >>> e(); >>> - n being a variable that is an integer value that increases after >>> each loop >> >> A couple of questions come to mind: >> >> 1) Are you sure the document is valid? Does there exist an XML schema >> against which you can test this document instance? If not, you might >> consider creating an XML schema, a sample document, and running the pair >> through a validating parser such as xmllint. Such validation tests can >> be a useful part of your overall product verification/validation regime. >> >> 2) Have you considered using GQuery to produce nodelists? For complex, >> valid documents it can be a useful tool. >> >> 3) Consider using loops controlled by NodeList.length() instead of using >> the builder pattern to process the tree. In my experience, using loops >> instead of the builder pattern yields fewer surprises at runtime. I >> realize there's a "cool factor" to chaining those method calls, but it >> usually results in issues such as the one you're now trying to resolve. >> >> Bueno Suerte, >> jec > -- You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group. To post to this group, send email to google-web-toolkit@googlegroups.com. To unsubscribe from this group, send email to google-web-toolkit+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-web-toolkit?hl=en.