Re: [mina] xml codec IoFilter?

Fernando Padilla Tue, 20 Oct 2009 10:08:20 -0700

Well, I gave it a try, and reviewing what Vysper actually did makes itseem a lot more manageable. There really are a handful of cases, andmost of them are plainly ignored (comment,pi,doctype), most are justtext handling (cdata,text). The more complicated one is element-tag,which has several sub-states (elementname, attributename,attributevalue). But vysper ignores the element-tag sub-states, andsimply waits until element-tag is all there before parsing name/attrs(<el attr="attr"> or <el attr="attr"/>) (which is a good firstimplementation of this, but can be easily enhanced too).

I wrote up a draft version of a SAX parser for Mina last week, which Ithink is not a bad representation of what I'm thinking. Since a SAXparser is free to call back to a listener as it sees fit. Then I wasthinking we could create another codec/processor that would have variousoptions on how to convert the sax event stream into a DOM event stream.Since some applications want a full document (only using it for NIOparsing), while other applications want a unbounded stream of domelements ( like vysper/xmpp ).

Not sure where to put up the code to get comments.. maybe I should learngithub. :)




On 10/19/09 10:39 PM, Ashish wrote:

Actually, this very problem was the run I discussed with Bernd last spring
during ApacheCon, as I was looking for a XML parsing supporting stops in the
middle of a XML tag. We need some XML parser that support this kind of
partial data, and can recover from it. Not simple ...


Mine was working fine partially, though I didn't tested it for all the
use cases.
Had tried both the approaches, first was to extend an external parser
to support this. It worked for simple cases.
The second was a bit dumb solution, but worked fine. Manually just
look for start and end (root elements) of XML.
Once complete xml is received, slice the buffer and pass it to a full
blown parser to do actual XML parsing. It kept life real simple.
However, the problem was less than solved, as I was unable to handle
misbehaving clients, like never sending end element, and starting a
new XML. Though rare but implementation has to be robust enough to
deal with them.

I will see if I still have the code :-(

A straight out of box solution won't work, as a TCP packet can have
end of one xml and start of next one :-)
This was the reason why I opted for dumb approach. Else we make our
parser to slice the complete xml and leave the unfinished data in
buffer. This is where the real challenge lies.

What I was thinking was to reduce two passes. Modify XML parser to
work on packets or on pure stream. Packets approach would be more
challenging. Parse the packet, keep the XML tree, as and when the tree
is complete, return the XML tree. Or pass on packets to parser and let
it parse. Catch uncomplete xml/data exception and store the data in
memory or file system. Once it completes the xml, get the xml, slice
the stream.

Have to stop here else it shall become an essay :-)

Good Luck

Re: [mina] xml codec IoFilter?

Reply via email to