> Actually, this very problem was the run I discussed with Bernd last spring > during ApacheCon, as I was looking for a XML parsing supporting stops in the > middle of a XML tag. We need some XML parser that support this kind of > partial data, and can recover from it. Not simple ... >
Mine was working fine partially, though I didn't tested it for all the use cases. Had tried both the approaches, first was to extend an external parser to support this. It worked for simple cases. The second was a bit dumb solution, but worked fine. Manually just look for start and end (root elements) of XML. Once complete xml is received, slice the buffer and pass it to a full blown parser to do actual XML parsing. It kept life real simple. However, the problem was less than solved, as I was unable to handle misbehaving clients, like never sending end element, and starting a new XML. Though rare but implementation has to be robust enough to deal with them. I will see if I still have the code :-( A straight out of box solution won't work, as a TCP packet can have end of one xml and start of next one :-) This was the reason why I opted for dumb approach. Else we make our parser to slice the complete xml and leave the unfinished data in buffer. This is where the real challenge lies. What I was thinking was to reduce two passes. Modify XML parser to work on packets or on pure stream. Packets approach would be more challenging. Parse the packet, keep the XML tree, as and when the tree is complete, return the XML tree. Or pass on packets to parser and let it parse. Catch uncomplete xml/data exception and store the data in memory or file system. Once it completes the xml, get the xml, slice the stream. Have to stop here else it shall become an essay :-) Good Luck -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal
