On 10/21/02 5:36 PM, "Cory C. Omand" <[EMAIL PROTECTED]> wrote:
> I am attempting to process a number of XML datagrams that are coming back from > a web service. Essentially, there are multiple full XML documents being > returned as one stream, and I would like to create a Document for each of > them. Can anyone give me a pointer as to where I should start looking? I was > thinking that some sort of SAX reader could separate each datagram into a > String object, and then I could use the SAXReader.read(String) method to > construct a Document object. Does anyone here have experience doing something > like this? If so, I would appreciate a pointer in the right direction... Hi, I'm doing something similar and was able to handle things with a custom lightweight pre-parser (we haven't gotten to the profiling stage to see how much of a hit it will be but I don't think it will be too significant). Unfortunately the code is part of a commercial product under development so I can't share it (we can discuss licensing offline if you're interested). I also had to jump through a few extra hoops because we're using nio (so everything is async CharBuffer). However, the basic procedure is pretty straightforward. I started with the XPP parser from www.xmlpull.org (it's fast and the entire parser is in one file). Then you just need to extract the parts that properly detect tags (skips over CDATA, comments, etc) and track the depth of your parse. Then break off the strings at the right positions (depth == 0) and feed it to the normal document builder. Since you already have the char data in memory, you can remove the character reading and buffering that the parser normally does and avoid creating entity strings which saves a lot of time and GC activity. You'll have to be prepared to devote a thread to the parser if you're reading a stream. If you're using java.nio you'll have to modify the parser further to allow you to push data into the parser rather than have it pull it from a stream. This lets you share/pool threads. Very helpful if the streams will be long lived and you have enough simultaneous connections that you can't devote a thread per connection. It sounds a bit more complex than it really is. I'd guess about 2 days to get it working correctly for your application. And of course, you can spend many more tweaking it for better performance. :) Hope that helps. -iain ------------------------------------------------------- This sf.net emial is sponsored by: Influence the future of Java(TM) technology. Join the Java Community Process(SM) (JCP(SM)) program now. http://ad.doubleclick.net/clk;4699841;7576301;v?http://www.sun.com/javavote _______________________________________________ dom4j-user mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-user