Hello Javier and Lazlo! >> Hunspell doesn't break the text in OpenOffice.org. OOo uses IBM ICU library >> for this task: http://wiki.services.openoffice.org/wiki/ICU >> >> It seems, updating IBM ICU in OpenOffice.org has generated your problem. >> Maybe new ICU files have overwritten the good syntax definitions of >> ZWSP tokenization. >> We need a new l10n OpenOffice.org issue with detailed bug report. >>
It is correct what Lazlo said that the problem of word separation lies within the breakiterator which is implemented by mean of ICU. But I dare to assume that even though your solution with ZWSP is currently the only option for you it might not exactly be as good a result as you want. That is for example if you have a text separated by the breakiterator and consisting of lets say 3 Khmer words XYZ, abcd and PQRST as I understand they will be displayed (and presented to the spellchecker) like this XYZabcdPQRST. If it now happens to be that there is a single error within the second word you can't help but have the spellchecker return a suggestion for the whole text, that is all 3 words. I don't know how long such constructs without spaces might get in Khmer. But if they tend to be longer it will become troublesome especially if you think you may have to deal with more than one error. E.g. one in the first word and one in the third and for both of 'em being more than one reasonable choice available. How are you going to handle the multitude of suggestions you have to deal with if you have to return suggestions for the text consisting of all three words? Seems that one can not at all be fixed by just modifying the current spellchecker implementation. There are two options I see to solve this: a) If ZWSP is simple AND fast enough to apply AND OpenSource it might be integrated into the breakiterator and thus it may be fine. (We may still have follow up issue with attributes being applied where they should not have been though. I've already seen similar issues with Chinese translation and Hangul/Hanja conversion). b) You have to wait for our grammar checking (or better proof reading) framework to get finished, because for that we will pass complete sentences on to the checker. Right now (in the CWS gcframework) we have implemented it to the point where it can be used for automatic checking and marking of wrong text but without having suggestion available in the context-menu. Basically I'm just saying you should probably be prepared to implement a grammar checker later on since that is likely to be the only correct solution to the problem. Regards, Thomas --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
