Hi,
The .dic and .aff files are NOT in utf-8. Each .aff/.dic pair has it own 8 bit character encoding. Myspell is designed around the rule that 1 byte = 1 char. There are anumber of encoding people can choose from for encoding their dictionaries but they must follow that 1 byte = 1 char rule.
OOo itself will convert from Unicode (16 bit) to the target encoding used by that dictionary in MySpell and will convert from the target encoding back to Unicode.
You need to look at the code that reads and parses lines of text so that it does not matter if carriage return / linefeed paoirs or just linefeeds are used to end text files. Both the Thesaurus and the MySpell code handle that properly.
Kevin
On Mar 13, 2005, at 12:55 PM, Luke Myers wrote:
Kevin Hendricks,
With multiple formats for encoding text (Unicode, DOS, 8-bit, multi-bit, etc.), I'm curious about how OOo keeps track of dictionary and affix files over multiple OSes. I was reading over your munch and unmunch but neither gave me any insight. It is my understanding that the dic files are in UTF8. My main concern is over the newline and accented characters. The program I'm writing now depends on a language file for input. I wrote the program in Linux, porting it DOS-16 with DJGPP and, therefore, Windows. It works perfectly in Linux. I think DOS doesn't recognize the newline chars correctly. Because of this none of my structures are populated and the program does not run correctly. It's a little bit tougher when you have OS and language portability in mind. The program to date can be found at http://conjugnu.sourceforge.net/download/nightly/4.0/.
BTW, has MySpell changed from OOo 1.1.4 (do I need to download the new source)?
Cheers, Luke
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
