Re: [lingu-dev] OOo Greek thesaurus

Marcin Miłkowski Wed, 11 May 2005 10:07:58 -0700

Petros V napisał(a):

Dear friends,

We are trying to create the Greek thesaurus for OOo. At first we
thought that the problem comes only from the awk script (about the big
endian words etc). Thanks to Daniel Naber we found out that the script
of Pavel Janik is working well but only when the two files (wordlist
and trimthess) contain only non-greek characters. So the problem is
related with awk and how it recognises the greek characters. Does
anybody know how we can solve this problem?
For more information I must report that the expiriments are taking
place on Win XP, with the two files saved with ANSI. If they are saved
with UTF-8,Unicode, Unicode Big Endian then the OOo hangs up.

First of all, no version of AWK (especially GAWK) supports Unicode nor any other double-byte encodings, so bear this in mind. You must use single-byte encoding for the files to be parsed by the AWK script. I suppose you are trying to build OOO thesaurus for the 1.x version; if that's the case, you cannot simply convert the resulting .dat and .idx files to UTF-8 because index values would be completely wrong.

Why is ANSI encoding wrong for Greek characters? Is there no single-byte encoding for your language? Or maybe you should use some ISO-8859-x encoding?

Hope that helps a little -
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] OOo Greek thesaurus

Reply via email to