Hi Thomas,

We have an easy way of typing ZWSP. It is in the spacebar of the Khmer keyboard (SP is shift+spacebar). We teach everybody to use it when they type.

It was working before with ICU 2.6, but it there seems to be a regression after the upgrade to the newer version.

As you say, using ZWSP is definitelly not the best solution. The correct thing would be that the applications are able to do tokenization and line breaking, so that we do not have to separete the words, but this will take some time (they do it already for Thai, though, through ICU). The problem is that all the appliations need to implement the algorithm for the text to be transportable. For example, we cannot stop using ZWSP in webpages while Internet Explorer manages line-breaking for Khmer... and that is probably going to take a very long time.

It will be great to work with the new framework. So it is probably a good idea to start working on a dic file that has all the word types and characteristics that will be used by the framework...

Cheers,

Javier

Németh László wrote
Hi Thomas,

2008/1/17, Thomas Lange - Sun Germany - ham02 - Hamburg <[EMAIL PROTECTED]>:
There are two options I see to solve this:

These options are not mutually exclusive: using optional ZWSP
characters as word breaks will not modify the grammar checking of the
sentences.  The problem is that ZWSP is not a word break character
now, but ZWSP "used to indicate word boundaries to text processing
systems when using scripts that do not use explicit spacing";
(http://en.wikipedia.org/wiki/Space_(punctuation))

Rregards,
László

a) If ZWSP is simple AND fast enough to apply AND OpenSource it
   might be integrated into the breakiterator and thus it may be
   fine. (We may still have follow up issue with attributes being
   applied where they should not have been though. I've already
   seen similar issues with Chinese translation and Hangul/Hanja
   conversion).
b) You have to wait for our grammar checking (or better proof
   reading) framework to get finished, because for that we will
   pass complete sentences on to the checker.
   Right now (in the CWS gcframework) we have implemented it to
   the point where it can be used for automatic checking and
   marking of wrong text but without having suggestion available
   in the context-menu.

Basically I'm just saying you should probably be prepared to implement a
grammar checker later on since that is likely to be the only correct
solution to the problem.


Regards,
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to