Hi László, Thomas,
We have ended including ZWSP specifically in the word boundary rules of
OpenOffice (originally taken from ICU).
I am meanwhile writing a proposal to the UNICODE consortium to revert to
the original state of ZWSP. UNICODE is not supposed to change characters
just like that.. and much less retroactivelly with an errata to the
standard, bypassing all the committees as this was done.
With respect to the Wikipedia entry, it is ambiguous, "used to..." is
also short for "It is used to...", which I think is the meaning that the
writer wanted to give (because ZWSP was changed in May 2008).
http://en.wikipedia.org/wiki/Space_(punctuation)
The good news is that it works again, and that we can do spellchecking.
We have written a dictionary-based breakiterator for Khmer, copying the
one in ICU 4.0... but OpenOffice 3.0 uses ICU 3.6, so we cannot yet
include it. I am looking into ICU 3.6, but Thai seems to be rule-based,
only Chinese and Japanese have dictionary based breakiterators. Is this
right ?
Cheers,
Javier
Németh László wrote
Hi Thomas,
2008/1/17, Thomas Lange - Sun Germany - ham02 - Hamburg <[EMAIL PROTECTED]>:
There are two options I see to solve this:
These options are not mutually exclusive: using optional ZWSP
characters as word breaks will not modify the grammar checking of the
sentences. The problem is that ZWSP is not a word break character
now, but ZWSP "used to indicate word boundaries to text processing
systems when using scripts that do not use explicit spacing";
(http://en.wikipedia.org/wiki/Space_(punctuation))
Rregards,
László
a) If ZWSP is simple AND fast enough to apply AND OpenSource it
might be integrated into the breakiterator and thus it may be
fine. (We may still have follow up issue with attributes being
applied where they should not have been though. I've already
seen similar issues with Chinese translation and Hangul/Hanja
conversion).
b) You have to wait for our grammar checking (or better proof
reading) framework to get finished, because for that we will
pass complete sentences on to the checker.
Right now (in the CWS gcframework) we have implemented it to
the point where it can be used for automatic checking and
marking of wrong text but without having suggestion available
in the context-menu.
Basically I'm just saying you should probably be prepared to implement a
grammar checker later on since that is likely to be the only correct
solution to the problem.
Regards,
Thomas
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]