On Mon, 17 Oct 2005, David Woodhouse wrote:

> On Mon, 2005-10-03 at 13:03 +0100, Alan J. Flavell wrote:
> > Assuming that it's really impractical for exim to map everthing it 
> > gets from MIME-encoded headers into some Unicode format for 
> > internal use,
> 
> Why would that be so impractical? 

We've already seen examples reported, of utilities which are intended 
to parse exim log files and apparently get thrown by non-ASCII 
characters in the log.

> The log should be in UTF-8, 

That's an entirely defensible point of view; but log files get written 
from all kinds of places in the exim code.  I'd make two points:

1) ensuring that exim always writes its logs with valid utf-8 encoding 
would be a non-trivial exercise.  (And check the applicable Unicode 
rules relating to parsing invalid data streams claiming to be utf-8.)

2) introducing utf-8 logs "without the option" is liable to mess up 
some of the useful existing log-parsing tools, especially those coming 
from third parties.

> and it isn't particularly hard to convert.

I presume you mean by applying the Iconv library.  Would that mean
introducing an additional exim pre-requisite?  (I'm not aware of exim 
currently using Iconv, but maybe it already does...).

Anyhow, in a technical log it *might* be more productive to retain the 
original encoded form (for possible debugging purposes), rather than a 
form that's derived from it by some complex conversion.  I don't think 
the case for utf-8 (nor for any other Unicode encoding scheme, for 
that matter, though I'd agree that utf-8 looks the best choice if such 
a choice has to be made) is so completely cut-and-dried as you 
presented it.

Also, as an aside - I'm told that Han unification can lead to loss of 
information when CJK codings are converted "by rote" into a Unicode 
encoding.  But this isn't my field, so I won't try to go into detail.

But if it were to be agreed that it's the right thing to do, then I 
suppose that item (1) is best addressed in the course of a general 
tidying-up of exim log writing, as this topic has come up before, and 
Philip has conceded that there are inconsistencies in there (modulo 
the usual shortage of Round Tuits ;-).

> Btw, it's interesting to note that the original poster was sending an
> autoreply to a message with a spam score of 5.1. :)

We had some problems here when we were scoring Asian MIME encodings - 
several members of staff come from over there, and have correspondents 
they really /do/ want to communicate with, despite the torrent of spam 
we were trying to keep out.

best regards

-- 
## List details at http://www.exim.org/mailman/listinfo/exim-users 
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://www.exim.org/eximwiki/

Reply via email to