Hi Thomas,
We have an easy way of typing ZWSP. It is in the spacebar of the Khmer
keyboard (SP is shift+spacebar). We teach everybody to use it when they
type.
It was working before with ICU 2.6, but it there seems to be a
regression after the upgrade to the newer version.
As you say, using ZWSP is definitelly not the best solution. The correct
thing would be that the applications are able to do tokenization and
line breaking, so that we do not have to separete the words, but this
will take some time (they do it already for Thai, though, through ICU).
The problem is that all the appliations need to implement the algorithm
for the text to be transportable. For example, we cannot stop using ZWSP
in webpages while Internet Explorer manages line-breaking for Khmer...
and that is probably going to take a very long time.
It will be great to work with the new framework. So it is probably a
good idea to start working on a dic file that has all the word types and
characteristics that will be used by the framework...
Cheers,
Javier
Németh László wrote
Hi Thomas,
2008/1/17, Thomas Lange - Sun Germany - ham02 - Hamburg <[EMAIL PROTECTED]>:
There are two options I see to solve this:
These options are not mutually exclusive: using optional ZWSP
characters as word breaks will not modify the grammar checking of the
sentences. The problem is that ZWSP is not a word break character
now, but ZWSP "used to indicate word boundaries to text processing
systems when using scripts that do not use explicit spacing";
(http://en.wikipedia.org/wiki/Space_(punctuation))
Rregards,
László
a) If ZWSP is simple AND fast enough to apply AND OpenSource it
might be integrated into the breakiterator and thus it may be
fine. (We may still have follow up issue with attributes being
applied where they should not have been though. I've already
seen similar issues with Chinese translation and Hangul/Hanja
conversion).
b) You have to wait for our grammar checking (or better proof
reading) framework to get finished, because for that we will
pass complete sentences on to the checker.
Right now (in the CWS gcframework) we have implemented it to
the point where it can be used for automatic checking and
marking of wrong text but without having suggestion available
in the context-menu.
Basically I'm just saying you should probably be prepared to implement a
grammar checker later on since that is likely to be the only correct
solution to the problem.
Regards,
Thomas
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]