Re: [api-dev] encoding flaw in dictionary entries

Stephan Bergmann Wed, 30 Nov 2005 23:42:11 -0800

Marc Santhoff wrote:

Am Dienstag, den 29.11.2005, 09:56 +0100 schrieb Stephan Bergmann:

Marc Santhoff wrote:
Am Montag, den 28.11.2005, 10:29 +0100 schrieb Stephan Bergmann:
Marc Santhoff wrote:
Hi,

I'm using dictionaries from basic code and noticed a problem. When the
search word from a dictionary entry is inserted into a writer doc the
encoding is not shown correctly.

Try this in a german localized version:

sub encError
        dls = createUnoService("com.sun.star.linguistic2.DictionaryList")
        dic = dls.getDictionaryByName("soffice.dic")
        entries = dic.getEntries()
        msgbox entries(16).getDictionaryWord()
end sub

In a german language version of OO.o 1.1.x this should read
"Bemaßungslinien" but the char "ß" is not converted correctly. This
holds true for the german  OO.o2.0-RC1/Windows, too.

Is this worth filing an issue or is it a pilots error?
It sure sounds like an error (so please file an issue):XDictionaryEntry.getDictionaryWord returns a UNO string, which isUnicode, so no excuse to garble an "ß" (and Basic's msgbox commandshould also be fully Unicode...).
Thank for replying.

I only thought I was missing some conversion function or the like
because all umlauts are garbled too. They are shown as two chars in a
writer doc. And from the GUI anything works as expected ...
You mean, adding text to a writer doc via some Basic code (where thetext to be added is represented as a literal Basic string) leads togarbled characters? That's strange. Maybe Andreas Bregas knows whetherthere is some part of Basic or the Basic IDE that works withlocale-dependent text encodings instead of Unicode?



Yes, that's what I wanted to say.

Another Test fpor the german localized OO.o:

sub encError2
        BasicLibraries.LoadLibrary("Tools")
        dls = createUnoService("com.sun.star.linguistic2.DictionaryList")
        dic = dls.getDictionaryByName("soffice.dic")
        entries = dic.getEntries()
        tmpDoc = CreateNewDocument("swriter")
        csr = tmpDoc.Text.createTextCursor()
        tmpDoc.Text.string = entries(16).getDictionaryWord() ' "ß"
        tEnd = tmpDoc.Text.getEnd()
        tEnd.String = entries(46).getDictionaryWord() ' "ö"
end sub

This does garble the special chars, too.

Regards,
Marc


Two things I noticed when trying to reproduce this:

1 You must be using a non-UTF-8 locale (probably 8859-1), check theenvironment variable LANG. If you set LANG to something like"de_DE.UTF-8" the problem should go away.


2  If you modify the Basic script by adding

    tEnd = tmpDoc.Text.getEnd()
    tEnd.String = "äöü"
  end sub

to the end, you see that Basic is not the culprit, as the umlauts showup correctly in the writer doc, regardless of LANG setting.

I suspect that the OOo dictionary implementation erroneously usesosl_getThreadTextEncoding() (which depends on LANG) to translate the(obviously UTF-8 encoded) strings within the dictionary data base toUnicode. Please update the issue (did you already write one?) accordingly.


-Stephan

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [api-dev] encoding flaw in dictionary entries

Reply via email to