Line breaking for Unicode is different from the well-known "look for
spaces" technique, which works only in European languages. In CJK
languages, a break can occur between any adjacent ideographic
characters.

An implementation of the Unicode Line Breaking algorithm is available
from
       ftp://ftp.ilog.fr/pub/Users/haible/gnu/linebreak-0.1.tar.gz

It implements line breaking for UTF-8 strings and, through iconv, also
for strings in any iconv supported encoding. It will be put under LGPL.

It's going to be used in GNU gettext, but would also fit nicely in GNU
textutils (fmt, fold), in groff, or be useful for word wrapping in
text editors (vim etc.).

If you want to test it, compile with

   "gcc -DHAVE_ICONV -DICONV_CONST=const -DTEST1 linebreak.c" (UTF-8 only)

or

   "gcc -DHAVE_ICONV -DICONV_CONST=const -DTEST2 linebreak.c" (any encoding)

Enjoy!

     Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to