On Mon, 17 Oct 2005, David Woodhouse wrote: > On Mon, 2005-10-03 at 13:03 +0100, Alan J. Flavell wrote: > > Assuming that it's really impractical for exim to map everthing it > > gets from MIME-encoded headers into some Unicode format for > > internal use, > > Why would that be so impractical?
We've already seen examples reported, of utilities which are intended to parse exim log files and apparently get thrown by non-ASCII characters in the log. > The log should be in UTF-8, That's an entirely defensible point of view; but log files get written from all kinds of places in the exim code. I'd make two points: 1) ensuring that exim always writes its logs with valid utf-8 encoding would be a non-trivial exercise. (And check the applicable Unicode rules relating to parsing invalid data streams claiming to be utf-8.) 2) introducing utf-8 logs "without the option" is liable to mess up some of the useful existing log-parsing tools, especially those coming from third parties. > and it isn't particularly hard to convert. I presume you mean by applying the Iconv library. Would that mean introducing an additional exim pre-requisite? (I'm not aware of exim currently using Iconv, but maybe it already does...). Anyhow, in a technical log it *might* be more productive to retain the original encoded form (for possible debugging purposes), rather than a form that's derived from it by some complex conversion. I don't think the case for utf-8 (nor for any other Unicode encoding scheme, for that matter, though I'd agree that utf-8 looks the best choice if such a choice has to be made) is so completely cut-and-dried as you presented it. Also, as an aside - I'm told that Han unification can lead to loss of information when CJK codings are converted "by rote" into a Unicode encoding. But this isn't my field, so I won't try to go into detail. But if it were to be agreed that it's the right thing to do, then I suppose that item (1) is best addressed in the course of a general tidying-up of exim log writing, as this topic has come up before, and Philip has conceded that there are inconsistencies in there (modulo the usual shortage of Round Tuits ;-). > Btw, it's interesting to note that the original poster was sending an > autoreply to a message with a spam score of 5.1. :) We had some problems here when we were scoring Asian MIME encodings - several members of staff come from over there, and have correspondents they really /do/ want to communicate with, despite the torrent of spam we were trying to keep out. best regards -- ## List details at http://www.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://www.exim.org/eximwiki/
