On 10/21/02 5:36 PM, "Cory C. Omand" <[EMAIL PROTECTED]> wrote:

> I am attempting to process a number of XML datagrams that are coming back from
> a web service.  Essentially, there are multiple full XML documents being
> returned as one stream, and I would like to create a Document for each of
> them.  Can anyone give me a pointer as to where I should start looking?  I was
> thinking that some sort of SAX reader could separate each datagram into a
> String object, and then I could use the SAXReader.read(String) method to
> construct a Document object.  Does anyone here have experience doing something
> like this?  If so, I would appreciate a pointer in the right direction...

Hi,

I'm doing something similar and was able to handle things with a custom
lightweight pre-parser (we haven't gotten to the profiling stage to see how
much of a hit it will be but I don't think it will be too significant).
Unfortunately the code is part of a commercial product under development so
I can't share it (we can discuss licensing offline if you're interested). I
also had to jump through  a few extra hoops because we're using nio (so
everything is async CharBuffer).

However, the basic procedure is pretty straightforward. I started with the
XPP parser from www.xmlpull.org (it's fast and the entire parser is in one
file). Then you just need to extract the parts that properly detect tags
(skips over CDATA, comments, etc) and track the depth of your parse. Then
break off the strings at the right positions (depth == 0) and feed it to the
normal document builder. Since you already have the char data in memory, you
can remove the character reading and buffering that the parser normally does
and avoid creating entity strings which saves a lot of time and GC activity.

You'll have to be prepared to devote a thread to the parser if you're
reading a stream. If you're using java.nio you'll have to modify the parser
further to allow you to push data into the parser rather than have it pull
it from a stream. This lets you share/pool threads. Very helpful if the
streams will be long lived and you have enough simultaneous connections that
you can't devote a thread per connection.

It sounds a bit more complex than it really is. I'd guess about 2 days to
get it working correctly for your application. And of course, you can spend
many more tweaking it for better performance. :)

Hope that helps.

-iain



-------------------------------------------------------
This sf.net emial is sponsored by: Influence the future 
of Java(TM) technology. Join the Java Community 
Process(SM) (JCP(SM)) program now. 
http://ad.doubleclick.net/clk;4699841;7576301;v?http://www.sun.com/javavote
_______________________________________________
dom4j-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-user

Reply via email to