Re: [lingu-dev] ZWSP

Javier SOLA Thu, 17 Jan 2008 03:39:38 -0800

Hi Thomas,

We have an easy way of typing ZWSP. It is in the spacebar of the Khmerkeyboard (SP is shift+spacebar). We teach everybody to use it when theytype.

It was working before with ICU 2.6, but it there seems to be aregression after the upgrade to the newer version.

As you say, using ZWSP is definitelly not the best solution. The correctthing would be that the applications are able to do tokenization andline breaking, so that we do not have to separete the words, but thiswill take some time (they do it already for Thai, though, through ICU).The problem is that all the appliations need to implement the algorithmfor the text to be transportable. For example, we cannot stop using ZWSPin webpages while Internet Explorer manages line-breaking for Khmer...and that is probably going to take a very long time.

It will be great to work with the new framework. So it is probably agood idea to start working on a dic file that has all the word types andcharacteristics that will be used by the framework...


Cheers,

Javier

Németh László wrote

Hi Thomas,

2008/1/17, Thomas Lange - Sun Germany - ham02 - Hamburg <[EMAIL PROTECTED]>:

There are two options I see to solve this:


These options are not mutually exclusive: using optional ZWSP
characters as word breaks will not modify the grammar checking of the
sentences.  The problem is that ZWSP is not a word break character
now, but ZWSP "used to indicate word boundaries to text processing
systems when using scripts that do not use explicit spacing";
(http://en.wikipedia.org/wiki/Space_(punctuation))

Rregards,
László

a) If ZWSP is simple AND fast enough to apply AND OpenSource it
   might be integrated into the breakiterator and thus it may be
   fine. (We may still have follow up issue with attributes being
   applied where they should not have been though. I've already
   seen similar issues with Chinese translation and Hangul/Hanja
   conversion).
b) You have to wait for our grammar checking (or better proof
   reading) framework to get finished, because for that we will
   pass complete sentences on to the checker.
   Right now (in the CWS gcframework) we have implemented it to
   the point where it can be used for automatic checking and
   marking of wrong text but without having suggestion available
   in the context-menu.

Basically I'm just saying you should probably be prepared to implement a
grammar checker later on since that is likely to be the only correct
solution to the problem.


Regards,
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] ZWSP

Reply via email to