Am Donnerstag, 23. Juni 2005 13:52 schrieb Dominik Stadler:
> On Thu, 23 Jun 2005 10:29:59 +0200, Axel Weiß wrote:
>
> Hi
>
> > you could try to make your data stream look like legal xml content,
> > by preceding the stream with a xml header and an opening root
> > element. It would then look like
> >
> > <?xml version="1.0"?>
> > <data-stream>
> > <FIRSTMESSAGE>...</FIRSTMESSAGE><SECONDMESSAGE>...</SECONDMESSAGE>
> >
> > This can be parsed with SAX parser, even if the root element never
> > will be closed.
>
> Thanks for your suggestion, I forgot to mention that each XML-Message
> might contain header-information, so I could have something like
>
> <!DOCTYPE
> STREET_REF SYSTEM "fti://repository/dtd/STREET_REF">
> <FIRSTMESSAGE>...</FIRSTMESSAGE>
> <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <!DOCTYPE
> STREET_REF SYSTEM "fti://repository/dtd/STREET_REF">
> <SECONDMESSAGE>...</SECONDMESSAGE>
>
> which would not lead to a valid XML-Message, because there are
> <?xml...>- and <!DOCTYPE..>-elements somewhere in between...

Dominik,

then I'd suggest to try using the SAX-parser. What you have is a 
sequence of valid xml contents. You will the be able to parse each 
message, since the SAX parser tells you when it's input has reached the 
end, that is the closing root element tag. Your input then points to 
the beginning of the new message, and you can parse it with the SAX 
parser being reset.

> We know, however, that every XML-Message starts on
> a new line, so we might need to concatenate lines until we have a
> complete XML-Message and then use that... Also kind of ugly and
> potentially time-consuming, but I don't see any other way except to
> write a small specialized XML-Grammar and this still wouldn't be able
> to parse messages in encodings like UTF-16.

I don't think you need to write your own parser. Just divide your input 
stream into xml documents (the SAX parser will help you with this). You 
should even be able to parse arbitrary encodings following each other.

Cheers,
                        Axel

-- 
Humboldt-Universität zu Berlin
Institut für Informatik
Signalverarbeitung und Mustererkennung
Dipl.-Inf. Axel Weiß
Rudower Chaussee 25
12489 Berlin-Adlershof
+49-30-2093-3050
** www.freesp.de **

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to