Replying to myself. FYI.

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On
> Behalf Of Kamenik, Aleksander
> Sent: Friday, October 08, 2010 6:36 PM
> To: DBMail mailinglist
> Subject: Re: [Dbmail] importing email with utf envoded subjects
> 
> > It is in UTF-8, but it is also violation of rfc822/rfc2047. Headers
> > must
> > be encoded. Computer program can't detect used character set, if
> sender
> > does not specify which character set is used.
> 
> I was aware of that. However these emails exist and I can't get Outlook
> to change anyway.

These emails are created when Outlook internally generetas them. For example a 
message from Office Communicator which states "Missed conversation with 
somebody, somebody else, etc". If one the names contains chars with umlauts 
it's simply encoded in UTF8 without specifying so.

There are emails sent from other Outlook clients via same Exchange server too, 
which have UTF8 subjects when they contain for example umlauts.

> > It is highly unlikely
> > that
> > all your malformed emails are in utf-8. You can have a mix of utf-8,
> > iso-8859-1, iso-8859-13, iso-8859-15, windows-1252, windows-1257 and
> > other
> > character sets. Older Estonian emails are probably not in utf-8. If
> you
> > try to fix all 8bit subjects, you will break malformed iso-8859-x
> > Estonian
> > texts that look ok in Outlook now.
> 
> The problem is only with the Subject header on a subset of emails. The
> bodies look OK before and after conversion. UTF8 subjects break after
> conversion from pst to dbmail via mbox.
> 
> My plan if all else fails is to generate the mbox files and then go
> through them searching for the UTF8 subjects and recode these only. Or
> modify libpst to do that which would be more efficient.

And so I did. An awk script with a pipe to a shell script that does the charset 
check and converts to quoted printable format using mmencode. I have a solution.

If you ever do something similar then make sure to convert the Subject: header 
in the header of the main email message only. You'll find loads of UTF8 
subjects in the messages itself as Outlook usually quotes inline starting with 
the "-----Original Message-----" line and then quoting the To, From, Sent and 
Subject headers which are in UTF8, but that's OK as they are in the body of the 
message.

libpst should do this IMHO, but I'm not versed in C to actually write a patch.

Regards,

Aleksander Kamenik
System Administrator
Krediidiinfo AS
an Experian Company
Phone: +372 665 9649
Email: [email protected]
_______________________________________________
DBmail mailing list
[email protected]
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail

Reply via email to