Hi Damiano,

The approach you're taking is going to be a hit or miss kind of thing, 
depending on the parser and the way your data is coming across the 
socket connection. If you're sending multiple XML documents across the 
connection there's a good chance the parser will read beyond the "end" 
of the current document and on into the next one because of buffering, 
which will cause that data to be missing when you try to read the next 
document. If you're only sending one at a time (and then waiting for a 
response before sending the next) the parser may block before the end of 
the document because it doesn't have a full buffer or data.

A more general solution is to wrap the socket with a special Reader that 
(for instance) looks for the XML declaration "<?xml ...". This is 
optional, but if you know it's always going to be present (and that it 
won't be embedded in documents using CDATA sections) you can look for 
that pattern in the text and deliver a logical end of input at that 
point (until it's time to parse the next document).

Is that clear? Your multi-document Reader would wrap a normal Reader as 
input. It would provide the usual Reader calls for the parser, along 
with an additional control method - "advanceDocument()", perhaps. When 
the parser calls the "close()" method you just ignore it (or set a 
logical close flag that's cleared by the advanceDocument() call). You 
would not supply any more data on read() calls past the start of the XML 
declaration until after advanceDocument() was called, though.

Hope that helps. I haven't got time to provide more details now, but I'm 
sure somebody has implemented this before and you can probably find it 
with a search.

  - Dennis

Ing. Damiano Bolla wrote:

> Maybe I did not make clear the solution I used.
>
> It IS possible to parse an XML stream coming from a socket WITHOUT 
> knowing its length.
> What you need to do (I can post the code if you wish) is understanding 
> the logical end of the
> XML stream (the logical end is whan you go back to level Zero of the 
> XML tree).
>
> To do this you need to use a SAXDriver and use a custom ContentHandler 
> class.
> It works.
>
> Damiano
>
>
> At 12.36 21/06/2002 +0100, James Strachan wrote:
>
>> From: "Ing. Damiano Bolla" <[EMAIL PROTECTED]>
>> > Not really, reading from a socket you do not know whan the document is
>> finished
>> > You really need to use a SAXDriver to parse the data stream 
>> logically and
>> > detect the logical end of file.
>> > As far as I understand this is the only option available.
>>
>> Unfortunately an XML parser can never know when the document ends. 
>> After the
>> last element tag there can be endless processing instructions or 
>> comments.
>> e.g.
>>
>> <foo>
>> <!-- end of docuemnt -->
>> <?foo a="123??>
>>
>> So when making XML servers you typically need to send the size first, 
>> then
>> the document, so you know how far to read. e.g. like a HTTP GET or 
>> POST, the
>> content length helps a client know how far to read on the stream.
>>
>> James
>>
>>
>> _________________________________________________________
>> Do You Yahoo!?
>> Get your free @yahoo.com address at http://mail.yahoo.com
>
>
> Damiano Bolla, Director R&D, Infotech S.r.l
>
>
>
>
>
> -------------------------------------------------------
> Sponsored by:
> ThinkGeek at http://www.ThinkGeek.com/
> _______________________________________________
> dom4j-dev mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/dom4j-dev
>




-------------------------------------------------------
Sponsored by:
ThinkGeek at http://www.ThinkGeek.com/
_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

Reply via email to