Re: [lingu-dev] ZWSP

Thomas Lange - Sun Germany - ham02 - Hamburg Thu, 17 Jan 2008 01:16:53 -0800

Hello Javier and Lazlo!

>> Hunspell doesn't break the text in OpenOffice.org. OOo uses IBM ICU library
>> for this task: http://wiki.services.openoffice.org/wiki/ICU
>>
>> It seems, updating IBM ICU in OpenOffice.org has generated your problem.
>> Maybe new ICU files have overwritten the good syntax definitions of
>> ZWSP tokenization.
>> We need a new l10n OpenOffice.org issue with detailed bug report.
>>


It is correct what Lazlo said that the problem of word separation lies
within the breakiterator which is implemented by mean of ICU.

But I dare to assume that even though your solution with ZWSP is
currently the only option for you it might not exactly be as good a
result as you want.
That is for example if you have a text separated by the breakiterator
and consisting of lets say 3 Khmer words XYZ, abcd and PQRST as I
understand they will be displayed (and presented to the spellchecker)
like this  XYZabcdPQRST. If it now happens to be that there is a single
error within the second word you can't help but have the spellchecker
return a suggestion for the whole text, that is all 3 words.

I don't know how long such constructs without spaces might get in Khmer.
But if they tend to be longer it will become troublesome especially if
you think you may have to deal with more than one error. E.g. one in the
first word and one in the third and for both of 'em being more than one
reasonable choice available. How are you going to handle the multitude
of suggestions you have to deal with if you have to return suggestions
for the text consisting of all three words?
Seems that one can not at all be fixed by just modifying the current
spellchecker implementation.

There are two options I see to solve this:
a) If ZWSP is simple AND fast enough to apply AND OpenSource it
   might be integrated into the breakiterator and thus it may be
   fine. (We may still have follow up issue with attributes being
   applied where they should not have been though. I've already
   seen similar issues with Chinese translation and Hangul/Hanja
   conversion).
b) You have to wait for our grammar checking (or better proof
   reading) framework to get finished, because for that we will
   pass complete sentences on to the checker.
   Right now (in the CWS gcframework) we have implemented it to
   the point where it can be used for automatic checking and
   marking of wrong text but without having suggestion available
   in the context-menu.

Basically I'm just saying you should probably be prepared to implement a
grammar checker later on since that is likely to be the only correct
solution to the problem.


Regards,
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] ZWSP

Reply via email to