Re: [lingu-dev] ZWSP

Javier SOLA Mon, 04 Aug 2008 03:35:09 -0700

Hi László, Thomas,

We have ended including ZWSP specifically in the word boundary rules ofOpenOffice (originally taken from ICU).

I am meanwhile writing a proposal to the UNICODE consortium to revert tothe original state of ZWSP. UNICODE is not supposed to change charactersjust like that.. and much less retroactivelly with an errata to thestandard, bypassing all the committees as this was done.

With respect to the Wikipedia entry, it is ambiguous, "used to..." isalso short for "It is used to...", which I think is the meaning that thewriter wanted to give (because ZWSP was changed in May 2008).


http://en.wikipedia.org/wiki/Space_(punctuation)

The good news is that it works again, and that we can do spellchecking.

We have written a dictionary-based breakiterator for Khmer, copying theone in ICU 4.0... but OpenOffice 3.0 uses ICU 3.6, so we cannot yetinclude it. I am looking into ICU 3.6, but Thai seems to be rule-based,only Chinese and Japanese have dictionary based breakiterators. Is thisright ?


Cheers,

Javier



Németh László wrote

Hi Thomas,

2008/1/17, Thomas Lange - Sun Germany - ham02 - Hamburg <[EMAIL PROTECTED]>:

There are two options I see to solve this:


These options are not mutually exclusive: using optional ZWSP
characters as word breaks will not modify the grammar checking of the
sentences.  The problem is that ZWSP is not a word break character
now, but ZWSP "used to indicate word boundaries to text processing
systems when using scripts that do not use explicit spacing";
(http://en.wikipedia.org/wiki/Space_(punctuation))

Rregards,
László

a) If ZWSP is simple AND fast enough to apply AND OpenSource it
   might be integrated into the breakiterator and thus it may be
   fine. (We may still have follow up issue with attributes being
   applied where they should not have been though. I've already
   seen similar issues with Chinese translation and Hangul/Hanja
   conversion).
b) You have to wait for our grammar checking (or better proof
   reading) framework to get finished, because for that we will
   pass complete sentences on to the checker.
   Right now (in the CWS gcframework) we have implemented it to
   the point where it can be used for automatic checking and
   marking of wrong text but without having suggestion available
   in the context-menu.

Basically I'm just saying you should probably be prepared to implement a
grammar checker later on since that is likely to be the only correct
solution to the problem.


Regards,
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] ZWSP

Reply via email to