Re: Additional multi-byte functions

Tomohiro KUBOTA Mon, 14 May 2001 02:37:38 -0700
Hi,

At Thu, 10 May 2001 20:45:33 +0200,
Pablo Saratxaga <[EMAIL PROTECTED]> wrote:

> > I don't think, you want to replace strlen with mbslen very frequently!
> 
> In fact all the str* family shouldn't been named like that, but rather
> b* (as in byte); they don't deal with strings of text (that is, strings of
> chars) but of strings of opaque and meaningless bytes. They are useful to
> know the size of a string in memory, but not its size on display; it is too
> bad that so many books have that ascii assumptions that text is a string
> of chars with char=byte.

Sure.  In non-internationalized software, three concepts are confused:
number of bytes in a string, number of characters in a string, and
number of columns of a string.  Thus, to internationalize softwares,
separation of these concept will be needed.  For example, if one 
variable is used in a meaning of number of BYTES and CHARACTERS,
the software will need one more variable to distinguish them.  If
a variable is used for BYTES, CHARACTERS, and COLUMNS, the software
will need two more variables.

And more, there are many problems.  For example, one sometimes
wants to know the previous character from the pointed character.

  char * character_in_focus;
  char * previous_character;

  previous_character = character_in_focus - 1;

This is not an appropriate code in internationalization context.
To be encoding-independent, previous_character must be calculated
from the top of the buffer, though there are encoding-dependent
way to avoid such machine-power-consuming way for some encodings
such as UTF-8.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: Additional multi-byte functions

Reply via email to