Takuhiro Nishioka wrote on 2000-08-03 19:56 UTC:
> > Given the limitations of the mechanism, I guess it is best to treat each
> > Kanji character as a word on its own.
> 
> I don't know about mechanism.  In my hanble opinion, I
> think that it is a bit inconvinient that each Kanji
> character is treated as a word, at least when editing
> Japanese texts.

If you prefer that each consecutive sequence of Kanji is treated
like a single word, then replace

  SetCharacterClassRange(0x3300, 0x9fff, -1); /* CJK Ideographs */

and

  SetCharacterClassRange(0xf900, 0xfaff, -1); /* CJK Ideographs */

by

  SetCharacterClassRange(0x3300, 0x9fff, 0x4e00); /* CJK Ideographs */

and

  SetCharacterClassRange(0xf900, 0xfaff, 0x4e00); /* CJK Ideographs */

Is this more useful?

How this works is as follows: SetCharacterClassRange(a, b, c) assigns to
characters in the interval [a, b] the class code c. Class code -1 means
that the number of the character is the class code. Word selection goes
from the selected character to the left and right, until it hits a
different class code. Usually, the class code is one representative
character, e.g. the first character of some alphabet. I didn't invent
that mechanism, I just reprogrammed what xterm had already to be more
memory efficient, and I added a first draft of new classes for Unicode
characters beyond Latin-1.

You can also call SetCharacterClassRange from the command line using
option -cc as described in the CHARACTER CLASSES section of the xterm
manpage. (I just see that the man page still talks about 8-bit codes
in that section, which needs a minor rewriting once Robert's patch
has been integrated with the main release.)

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to