2010.10.08 17:17 Kamenik, Aleksander rašė:
>> You don't have to assume anything. Character set name is written in
>> first
>> section of B|Q encoding. If character set name is not written and
>> subject
>> is not encoded, it must be in us ascii.
>>
>> utf7 is rarely used for Subject. You have utf-8 (=?utf-8?b? or =?utf-
>> 8?q?)
>> or some unicode variat (unicode-#-# or unicode-#-#-some-text) or you
>> confuse Unicode with broken 8bit headers.
>
> These are from Outlook 2007 as far as I can tell. Not everybody follows
> standards. For example:
>
> # grep 'Subject: palun juur' mbox
> Subject: palun juurdepääsu
> # grep 'Subject: palun juur' mbox | hexdump -C
> 00000000  53 75 62 6a 65 63 74 3a  20 70 61 6c 75 6e 20 6a  |Subject:
> palun j|
> 00000010  75 75 72 64 65 70 c3 a4  c3 a4 73 75 0a
> |uurdep....su.|
> 0000001d
> #

c3 a4  c3 a4

It is in UTF-8, but it is also violation of rfc822/rfc2047. Headers must
be encoded. Computer program can't detect used character set, if sender
does not specify which character set is used. It is highly unlikely that
all your malformed emails are in utf-8. You can have a mix of utf-8,
iso-8859-1, iso-8859-13, iso-8859-15, windows-1252, windows-1257 and other
character sets. Older Estonian emails are probably not in utf-8. If you
try to fix all 8bit subjects, you will break malformed iso-8859-x Estonian
texts that look ok in Outlook now.

If those utf-8 emails looked OK in Outlook, maybe problem is in libpst.

-- 
Tomas

_______________________________________________
DBmail mailing list
[email protected]
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail

Reply via email to