Sorry, it seems to be a platform independent problem.
I could reproduce your problem even on the Linux machine.
Just had no really 'iso formatted' xml test document first.
In the code I sent, the TextInputStream is in fact providing the correct
character encoding and it turned out that the DocumentBuilder seems to
look only into the stream for the encoding. Thus it doesn't help to
provide the stream with a correct character encoding and you must
provide the definition of the encoding inside the stream (here in the
first line of your xml document).
The only way I could think of to bypass this problem would be
1. Write this definition into your file (as you stated)
2. Somehow write this definition into your stream first (don't know yet
how to do this)
3. Convert your stream encoding (maybe reading bytes from inputstream
and writing utf to the parser - how?)
Sorry again for not really helping you.
Maybe somebody else?
Btw: To get the build number without writing code you could open the
about box from the help menu and type sdt keeping the control key
pressed for all three letters.
Christian Andersson wrote:
> Hmm this is not working for me, I still get a null object from oDB.parse...
>
> what system do you test this on?
> I am running this on windows 2003 server and openoffice 2.0
> (I know that there is a way to get build number, but I keep forgetting it)
>
> Christoph Jopp wrote:
>
>> Kjære Christian,
>> for meg følgende code virker:
>>
>> oSFA = createUNOService ("com.sun.star.ucb.SimpleFileAccess")
>> oInpStream = oSFA.openFileRead(sUrl)
>> oTextInpStream = createUnoService("com.sun.star.io.TextInputStream")
>> oTextInpStream.setInputStream(oInpStream)
>> oTextInpStream.setEncoding("iso-8859-1")
>> oDB = createUnoService("com.sun.star.xml.dom.DocumentBuilder")
>> domDoc = oDB.parse(oTextInpStream)
>> oInpStream.closeInput
>>
>> Sorry for my bad Norvegian but It's long ago, I've been there.
>> To the code:
>> You have to use a TextInputStream to be able to set the encoding.
>>
>> Hope it helps.
>> Ha det bra,
>> Christoph
>>
>>
>> Christian Andersson wrote:
>>
>>> I have a small problem, In starbasic I'm using (almost) the following
>>> code (there might be small mistakes sicne I'm writing this from memory)
>>> to read and parse an xml document with starbasic
>>>
>>> oSFA = createUNOService ("com.sun.star.ucb.SimpleFileAccess")
>>> oInpStream = oSFA.openFileRead(sUrl)
>>> oDB = createUnoService("com.sun.star.xml.dom.DocumentBuilder")
>>> domDoc = oDB.parse(oInpStream)
>>> oInpStream.closeInput
>>>
>>> this works for me almost perfectly, and I say almost, since there are
>>> some xml documents that it cannot read.
>>>
>>> the problem I am having is that some documents (that are beeing
>>> generated by a third party system which I cannot change)
>>>
>>> have not declared that it is an xml document like this
>>> <?xml version="1.0" encoding="utf-8" ?>
>>>
>>> it just starts with the xml tags directly liek this
>>>
>>> <test>
>>> <test2>
>>> .....
>>> </test2>
>>> </test>
>>>
>>> this is all fine, I have other xml documents that also look liek this,
>>> and Openoffice can read and parse them.
>>> however within these problematic documents they are using national
>>> characters (åæø) encoded using iso-8859-1 and this is the problem.
>>> if they were encoded with utf-8 openoffice can read the document without
>>> having any ecoding declaration. but with iso-8859-1 the oDB.parse
>>> function just returns null. no errors/exceptions or anything, just null.
>>>
>>> if I in that file manually add <?xml version="1.0" encoding="iso-8859-1"
>>> ?> at the start, openoffice can read it perfectly..
>>>
>>> so is there some way I can force the dom parser to use iso-8859-1
>>> instead of utf-8 ?
>>> it would be great if I could do
>>> domDoc = oDB.parse(oInpStream, "iso-8859-1")
>>> and it would work, but from what I can see there is no function for this
>>> in the DocumentBuilder, not is there anything like this in the
>>> inputstream object or the simplefileaccess object.
>>>
>>> I should be able to get around this problem by programmaticly make a
>>> copy of the file, and insert the <?... part first and then use my
>>> modified file for reading the xml file, but this is only a last resort
>>> sollution.
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]