Re: [Dovecot] pigeonhole, regex, UTF-8

Trever L. Adams Tue, 13 Jul 2010 10:52:18 -0700

 On 07/13/2010 10:16 AM, Stephan Bosch wrote:

The standard regexp library does not support unicode and I was notplanning to write my own regexp compiler any time soon.

I wouldn't want to write one as well.

As a matter of fact, I haven't looked at TRE before. I'm quiteinterested though, since it is backwards compatible with POSIX andseems to be available in most systems. I'll give it a closer look,also in terms of compatibility with the latest draft of the Sieveregex extension specification.
Regards,

Stephan.

There are a few odd things about the wide character support in TRE.Either you need to convert each message to wchar_t and make sure you setthe system encoding to wchar_t, or you need to set the system encodingfor each message, which may or may not mess up your UTF-8 regex.

My project is an Internet Classifier (used with things like Squid proxyto make a filter). I convert everything to wchar_t (using iconv withinfo gathered from headers) and use the wide character versions of thefunctions. That way I know everything is just fine. I then have theprogram set the system encoding (at least the environment variable forthe given session) to UTF-8 before I do any of the regex compiling.Everything works wonderfully and quite quickly.

I am not sure TRE is available on all systems where dovecot is designedto be compiled. I know it is for most, if not all, Unix-like systems. Iuse it in Fedora.


Anyway, thank you your work on pigeonhole.

Trever

Re: [Dovecot] pigeonhole, regex, UTF-8

Reply via email to