Am 24.07.2013 um 00:24 schrieb Axel Rau <[email protected]>:

> Am 21.07.2013 um 05:35 schrieb Phil Pennock <[email protected]>:
> 
>> On 2013-07-20 at 19:05 +0200, Axel Rau wrote:
>>> As exim works with utf-8 strings, my naive assumption was, that a header 
>>> like
>>>     Subject: Neue =?ISO-8859-1?q?Gl=E4ser?=
>>> (RFC 2047) will be converted to utf-8 by exim before I access it via 
>>> $h_Subject: .
>>> Looking at the complexity of expand.c, this seems to be proved.
>>> Can anybody confirm this?
>> 
>> Exim's behaviour depends upon what value was defined for HEADERS_CHARSET
>> in Local/Makefile when Exim was built.  You also need HAVE_ICONV=yes but
>> that's supplied by default on some OSes.
>> 
>> The sample configuration supplied in src/EDITME sets
>> HEADERS_CHARSET="ISO-8859-1".
>> 
>> For myself, I always set HEADERS_CHARSET="UTF-8".
> 
> Indeed FreeBSD ports system defaults to ISO-8859-1.
> I reinstalled with UTF-8. Unfortunately exim -bV does not list the 
> HEADERS_CHARSET.
> From my simple tests, it seems to be work:
>       Subject: TEST =?ISO-8859-1?q?Gl=FCckliche_m=F6gliche_=C4chtung?=
> was recorded correctly in UTF-8 in the DB. 
>> 
>>> If the header contains none-ASCII 8-bit-characters (=illegal), I would like 
>>> exim to replace them by "?".
>>> Can this be done in the exim config or do we need a new expansion function 
>>> for that?
>> 
>> I *suspect* that a new expansion function would be needed, but I could
>> be proven wrong by a particularly clever hack.  I also suspect that, if
>> we were to implement this, we'd default the replacement character to be
>> codepoint 0xFFFD, the Unicode REPLACEMENT CHARACTER.
> 
> 
> Wouldn't this be reasonable enhancement of the existing conversion 
> functionality anyway?

After 10 days running with HEADERS_CHARSET="UTF-8" in Local/Makefile and 
PQsetClientEncoding(pg_conn, "UTF8"); in lokkups/pgsql.c, I still get tons of
'invalid byte sequence for encoding "UTF8"'(as expected by malformed mails).
I would like to prepare a patch for a bug report to replace the illegal sequence
by this Unicode REPLACEMENT CHARACTER.

Is there any place in the exim code base, where this information is available?
(I looked at rfc2047.c, expand.c…)

I need a solution in order to log header contents in a pgsql backend.

Thanks, Axel

---
PGP-Key:29E99DD6  ☀ +49 151 2300 9283  ☀ computing @ chaos claudius


-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim 
details at http://www.exim.org/ ##

Reply via email to