Re: [Dbmail-dev] mail content-type and database encoding

Robert Fleming Fri, 11 Nov 2005 00:39:13 +0100 (CET)

Aaron Stone wrote:

On Thu, 2005-11-10 at 13:26 -0800, Robert Fleming wrote:

This has been discussed a couple times before. I've tried to summarizeit here:
http://www.dbmail.org/dokuwiki/doku.php?id=unicode_postgresql_database


Bug 218 had to do with problems with Unicode encoding prior to
PostgreSQL 8.1. But everything else sounds like it's to do with the very
nature of proper encodings in general. Is there still a version
dependent component to this issue?

I'm not sure that bug 218 was related to the Unicode fixes in PostgreSQL8.1 -- attempting to store an ISO 8859-1 string (with octets > 127) in aUNICODE database would fail with all recent versions of PostgreSQL. Butat the same time I can't make out what exactly happened for the bugreporter. His message was "Content-Transfer-Encoding: 7bit", thus/should/ not have had any octets outside the US-ASCII range -- thuswould be storable in a UNICODE db.

It seems to me that these are all the same problem: putting an invalidUTF-8 sequence in a "text" field in a database with UNICODE encoding.IMHO, the database should be asked to just store raw octets as they'rereceived from the Internet (as you mentioned, there are no guaranteesthat received messages will not have encoding anomalies). So askingthe database to do automatic encoding conversions via the "clientencoding" mechanism is just going to cause problems (would need toguarantee perfect round-tripping of conversions, e.g. to preservedigital signatures).

I would say that in general, this dbmail issue is not dependent onPostgreSQL version because no recent PostgreSQL version would haveallowed illegal UTF-8 sequences in UNICODE databases.


Robert

Re: [Dbmail-dev] mail content-type and database encoding

Reply via email to