On 10/14/2011 09:32 AM, coffeMan wrote:
> I got the solution resolved.....i am parsing over 11,000 different
> file types...it is going slow using the DOM Xml Parser...any ideas on
> how to improve performance?
> 
> I cannot think of any other way to parse it

Well, first things first: let's clear-up the terminology. I think you
mean 11x10^3 different documents, all conforming to the same schema.
You only have /one/ document type: xml.

Please correct my impression otherwise.

Short answer: Form a NodeList of "interesting" leaf nodes, and don't
worry about the path from the document root to each leaf.

Long answer follows.

11x10^3 different documents is not unusual in a production environment.
For example, consider the single DocBook schema, and the count of
documents derived from that single schema.

Apparently, all you know is that the current document is well-formed.
You do not know if it's valid. Some might argue that you do not even
know if the document is well-formed, but let's assume the document was
produced mechanically, and that all elements, attributes, and PCDATA are
well-formed.

So, you should only write code that relies on the document's physical
structure, not its logical structure.

I think that the best you can do is to treat the document as a "flat
space". Go directly to the child nodes of interest. There's probably
nothing to gain by parsing the document as though it were a tree (which
it is, I know...). In other words, given what little I know about your
specific problem, I believe you are probably just interested in leaf
nodes. So, form a NodeList of those leaf nodes, and don't worry about
the path from the document root to each leaf. The leaf nodes in the list
will probably have different parents, but I don't think that matters in
this instance.

Forget my earlier advice about GQuery. It's probably over-kill, given
what little I know about the problem you're trying to solve.

Bueno Suerte,
jec

> 
> On Oct 14, 10:40 am, Jeffrey Chimene <jchim...@gmail.com> wrote:
>> On 10/14/2011 7:00 AM, coffeMan wrote:
>>
>>> I am retrieving XML from a servlet and parsing through it.  the xml
>>> code is one large XML file but on through the servlet.  I can parse
>>> through it easy and get my results but i am not sure when to stop it.
>>> I keep getting NullPointerException that kicks off when it reaches the
>>> end.  I never know when its going to end because every file that i
>>> parse through is different in length.
>>
>>> I am using the DOM parser.  Document messageDom =
>>> XMLParser.parse(srv.getXmlObject());
>>
>>> string name =
>>> messageDom.getElementsByTagName("name").item(n).getFirstChild().getNodeValu 
>>> e();
>>> - n being a variable that is an integer value that increases after
>>> each loop
>>
>> A couple of questions come to mind:
>>
>> 1) Are you sure the document is valid? Does there exist an XML schema
>> against which you can test this document instance? If not, you might
>> consider creating an XML schema, a sample document, and running the pair
>> through a validating parser such as xmllint. Such validation tests can
>> be a useful part of your overall product verification/validation regime.
>>
>> 2) Have you considered using GQuery to produce nodelists? For complex,
>> valid documents it can be a useful tool.
>>
>> 3) Consider using loops controlled by NodeList.length() instead of using
>> the builder pattern to process the tree. In my experience, using loops
>> instead of the builder pattern yields fewer surprises at runtime. I
>> realize there's a "cool factor" to chaining those method calls, but it
>> usually results in issues such as the one you're now trying to resolve.
>>
>> Bueno Suerte,
>> jec
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Google Web Toolkit" group.
To post to this group, send email to google-web-toolkit@googlegroups.com.
To unsubscribe from this group, send email to 
google-web-toolkit+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-web-toolkit?hl=en.

Reply via email to