Hi Javier,

Could you create an issue, or you have already had an issue, and attach the patch to the issue? I will see how we can intergate the patch.

Thanks,
Karl.

Javier SOLA 写道:
Hi Eike and Karl,

After doing some twisting, working with Jens Herden, we got automatic
dictionary-based line-breaking and word boundaries for Khmer working on
OOO300_m2

We did it for ICU 4.0 (based on the Thai breaker) , back-ported to ICU
3.6, and integrated in OOo. It took us a while for the word boundaries,
because they are all rule-based in OOo (except Chinese and Japanese). We
had to do an exception for Khmer (in the Unicode breakiterator), not
allowing it to build a rule-based iterator, and then it looked for it in
ICU, and it worked.

It was a little tricky because new resources where added (the Khmer
dictionary), and they are not in the patchable source, but in the
resource bundle icu36dt.dat, We had to create a new version of that file
and change it in the tarball.

Of course, this will not go into 3.0, but we can use it later for 3.1...
depending on what version of ICU goes in.

This will be a big leap forward for us, Having to teach people to use
ZWSP has always been a problem, because the concept of "word" is
unclear in Khmer.

The same can easily be done for Burmese, Lao, and other languages that
use these scripts (and do not separate words), counting on word-lists.

Javier




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to