Re: [Dbmail] recipient enconding in TB sent folder

Paul J Stevens Sun, 13 May 2007 15:16:52 -0700

Aaron Stone wrote:
> On Sat, May 12, 2007, Anne <[EMAIL PROTECTED]> said:


>> Does this mean when someone is still at 2.2.2, one should wait for 2.2.6 ?

Maybe, but with some luck not. Read on.

> The encoding issues won't be a regression in 2.2.5, just an ongoing issue
> that will take too long to fix to meet the intended schedule.

Well, it turns out there *was* a regression that looked like something
else at first.

> 
> Something Paul may need to clarify is if there will be any differences in
> how data is stored in 2.2.2 vs. the upcoming 2.2.5 -- would there be a
> problem if a site upgraded to 2.2.5, found bugs, then reverted back to
> 2.2.2?

Ok. Reverting back will be possible. The changes affect the headercache
only, which can easily be rebuild using dbmail-util.

However, building on patches offered by Anton and others we now have
reliable utf8 support in the headercache and the code that hooks into
that.

Some history:

Before 2.2.3, headervalues were stored in 2.1.x and 2.2.x 'as-is'. This
meant that headers were encoded in utf7 if they contained non-us-ascii
characters. But quite early on Lars and others complained this broke
searching and sorting on the headers. They were right, storing headers
as-is is not valid for this and for another reason: quite often oft-used
mailclients send email that contains illegal non-ascii 8bit characters,
and there is simply no way you can store those reliably in a us-ascii
database and expect good things to come of it. Still, that was the
status-quo ante in dbmail and users have complained about it from the
beginning. Especially if they where using postgres on non-US-ASCII
encoded database.

Since 2.2.3 however, all headervalues are stored in the encoding
specified in the 'ENCODING' configuration field which is supposed to map
one-on-one with the encoding the database uses internally. This means
users can now store all headers using UTF8 encoding. And that is very good.

Today, I managed to fix the subtle bug that broke the FETCH response the
OP mentions. Turns out Anton's original patch did some utf8 magic on the
strings before storing them to improve collation (used in sorting), but
doing that was not required (since the database can handle collation
just fine) and it also broke recoding the headervalues back into utf7
(which is what the mailclients want to see).

But I just landed a couple of patches that clean up the utf8 conversion
framework and remove the bug. All is well now again, but...

Mysql and postgresql users using anything but ENCODING=utf8 should take
notice. Dbmail uses utf8 internally because gmime uses utf8 internally.
You should do too. Convert your databases if you can, or plan to do so
soon. Unicode and its UTF8 representation are the future.

Converting your mysql database is trivial, but takes time and table
locks (making the database read-only). A couple of 'ALTER' statements
got me going just fine.

Converting your postgres database is non-trivial. In fact I'm not even
sure how hard it is. Someone fill me in please.

I'll hold 2.2.5 for just a day or two extra to see how this latest patch
works out. If there are unforseen problems, I'll rewind my tree a bit
and release svn revision 2564 as 2.2.5, or branch out from there if
anything needs to go in.

So test and time will learn. Stay tuned.


-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Re: [Dbmail] recipient enconding in TB sent folder

Reply via email to