David Starner wrote:
> The dict standard dictates that all data crossing the wire shall be in
> UTF-8. Unfortunately, the reference implementation doesn't even try to
> get it right. I was discussing the issue with a maintainer of a Russian
> dictionary for dict, and part of the problem was that there was no UTF-8
> regex engine. Does anyone know of a UTF-8 regex engine, preferably one
> that can be plugged into a GPL'ed C program easily?
The Vim regexp supports UTF-8 quite well. But it's not a separate
library, it's closely connected to Vim. You could separate it with a
bit of work. It does support very powerful constructs (comparable to
Perl) but the syntax is that of Vi (and extensions of Vim), which
somewhat differs from sed and POSIX. The Vim license is now GPL
compatible, so that should not be a problem.
--
hundred-and-one symptoms of being an internet addict:
38. You wake up at 3 a.m. to go to the bathroom and stop and check your e-mail
on the way back to bed.
/// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.moolenaar.net \\\
/// Creator of Vim -- http://vim.sf.net -- ftp://ftp.vim.org/pub/vim \\\
\\\ Project leader for A-A-P -- http://www.a-a-p.org ///
\\\ Help me helping AIDS orphans in Uganda - http://iccf-holland.org ///
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/