Re: How to read multiple XML from socket: cannot change the protocol (Re: How to handle continuous stream of XML)

Aleksander Slominski Fri, 24 Feb 2006 12:34:52 -0800

Mike Skells wrote:

I believe that you can have PI, comments, whitespace etc after the root
element, is that significant for you ?


----
I have the same problem in one of out applications. We looked to the format
the HTTP uses, and 'borrowed' the ideas from there. Due to the volume of the
data that we handled then we could not afford the overhaed ov scanning each
byte /char for a specific marker.

What we do is insert a Content-Length:<xx><cr> marker that details the
length of the content( which is usually in our case a document> in bytes,
after encoding.

This has a drawback inthat the document needs to be prepared before being
sent, and therefore buffered, but for this application it is not an issue
and the documents that we handle are ver very small (a few hundred bytes)

but we have to manage 100-500 per second.

You could insert a marker to indicate a continuation, if your documents are
large, and you cannot afford the buffering

E.g.
More:4000
<4000 bytes>
More:4000
<4000 bytes>
More:4000
<4000 bytes>
Complete:407
<407 bytes that make up the rest of the document>

It works well for us. May not suit you

hi,

you could also use for this purpose HTTP 1.1 chunked encoding as itdoes exactly what you described and _more_ as it allows headers andtrailing headers (good for metadata)

If you truly cannot change the protocol, then if the documents that are sent
have a <?xml ... Header that you could use that as the marker, but his does
mean that you would not know that a documen is complete until the next
document is started, so you would always be one behind.

AFAICS that should not be a problem (?) as you never know that documentis complete until you read the last byte form the input in *whatever*format input is encoded ... so that means streaming still can be doneand there is no need to buffer whole input even when <?xml... markersare used.


best,

alek

-----Original Message-----
From: Joseph Kesselman [mailto:[EMAIL PROTECTED]Sent: 21 February 2006 18:21
To: [email protected]
Cc: [email protected]; Polk, John R.
Subject: Re: How to read multiple XML from socket: cannotchange the protocol (Re: How to handle continuous stream of XML)
Note too that a well-formed XML document can only have onetop-level element -- everything after that is normallydiscarded -- so that too could be used as a clue for diviinga multiple-document stream.
Or you could invent some new marker between documents, andhave your input-stream filter use that to break up the docs.
Or you could just pack all the XML files into a zipfile, sendthat, and have your recieving tool unpack that into separatefiles. This would have the advantage of not having to(slightly) break people's expectations about whether whatthey're getting back form the server is one document orseveral... and might actually improve performance, especiallyon larger documents; XML compresses wonderfully.
Whichever approach you use, note that this isn't really anXML problem; it's a stream management problem. The XML parserexpects to see a stream that presents only a single XMLdocument, so breaking up the stream into multiple docs has tohappend before it reaches the parser.
"Ooof! There's a wasp in the room!"
"Get out! Quick! Before it gets to the tiger...!" -- MontyPython, _Matching_Tie_And_Handkerchief_
______________________________________
Joe Kesselman -- Beware of Blueshift!
"The world changed profoundly and unpredictably the day TimBerners Lee got bitten by a radioactive spider." -- RafeCulpin, in r.m.filk
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
The best way to predict the future is to invent it - Alan Kay


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to read multiple XML from socket: cannot change the protocol (Re: How to handle continuous stream of XML)

Reply via email to