Takuhiro Nishioka wrote on 2000-08-03 19:56 UTC:
> > Given the limitations of the mechanism, I guess it is best to treat each
> > Kanji character as a word on its own.
>
> I don't know about mechanism. In my hanble opinion, I
> think that it is a bit inconvinient that each Kanji
> character is treated as a word, at least when editing
> Japanese texts.
If you prefer that each consecutive sequence of Kanji is treated
like a single word, then replace
SetCharacterClassRange(0x3300, 0x9fff, -1); /* CJK Ideographs */
and
SetCharacterClassRange(0xf900, 0xfaff, -1); /* CJK Ideographs */
by
SetCharacterClassRange(0x3300, 0x9fff, 0x4e00); /* CJK Ideographs */
and
SetCharacterClassRange(0xf900, 0xfaff, 0x4e00); /* CJK Ideographs */
Is this more useful?
How this works is as follows: SetCharacterClassRange(a, b, c) assigns to
characters in the interval [a, b] the class code c. Class code -1 means
that the number of the character is the class code. Word selection goes
from the selected character to the left and right, until it hits a
different class code. Usually, the class code is one representative
character, e.g. the first character of some alphabet. I didn't invent
that mechanism, I just reprogrammed what xterm had already to be more
memory efficient, and I added a first draft of new classes for Unicode
characters beyond Latin-1.
You can also call SetCharacterClassRange from the command line using
option -cc as described in the CHARACTER CLASSES section of the xterm
manpage. (I just see that the man page still talks about 8-bit codes
in that section, which needs a minor rewriting once Robert's patch
has been integrated with the main release.)
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/