Dear friends,
We are trying to create the Greek thesaurus for OOo. At first we thought that the problem comes only from the awk script (about the big endian words etc). Thanks to Daniel Naber we found out that the script of Pavel Janik is working well but only when the two files (wordlist and trimthess) contain only non-greek characters. So the problem is related with awk and how it recognises the greek characters. Does anybody know how we can solve this problem? For more information I must report that the expiriments are taking place on Win XP, with the two files saved with ANSI. If they are saved with UTF-8,Unicode, Unicode Big Endian then the OOo hangs up.
First of all, no version of AWK (especially GAWK) supports Unicode nor any other double-byte encodings, so bear this in mind. You must use single-byte encoding for the files to be parsed by the AWK script. I suppose you are trying to build OOO thesaurus for the 1.x version; if that's the case, you cannot simply convert the resulting .dat and .idx files to UTF-8 because index values would be completely wrong.
Why is ANSI encoding wrong for Greek characters? Is there no single-byte encoding for your language? Or maybe you should use some ISO-8859-x encoding?
Hope that helps a little - Marcin
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
