A NOTE has been added to this issue. ====================================================================== http://www.dbmail.org/mantis/view.php?id=548 ====================================================================== Reported By: idk Assigned To: ====================================================================== Project: DBMail Issue ID: 548 Category: IMAP daemon Reproducibility: N/A Severity: feature Priority: normal Status: new target: ====================================================================== Date Submitted: 22-Mar-07 11:23 CET Last Modified: 23-Mar-07 10:49 CET ====================================================================== Summary: WISH: Better parsing 8bit header characters Description: In mail header values there are valid only 7bit characters, so accents should be escaped. But... Seldom I got message from buggy mail client which ignore this rule.
MSOE's message list has invalid subject (it seems like UTF8 encodings but displayed by single byte), but opened message has Subject header displayed correctly (parsed from headers part of message). So I think it has a solution. MSOE under Windows (CZE) has default code page 1250, so this is one option MSOE interpreted Subject from all message content "correctly", other one is fetching of Content-Type header value (see Additional Information). The second option should be applicable for DBMail, I mean. ====================================================================== Relationships ID Summary ---------------------------------------------------------------------- related to 0000538 incorrect field cache values for messag... ====================================================================== ---------------------------------------------------------------------- paul - 22-Mar-07 14:42 ---------------------------------------------------------------------- This is exactly how it's done at the moment. If a header is 8bit the header string is converted to utf8. If the content-type header contains a charset specification dbmail will try to convert from the specified charset to utf8 Else dbmail will fall back to the charset specified in the DEFAULT_MSG_ENCODING config value and try to convert the string to utf8, assuming the header was encoded in that charset. If both fail dbmail will replace all 8 bit characters with '?'. ---------------------------------------------------------------------- idk - 22-Mar-07 16:25 ---------------------------------------------------------------------- mysql> SELECT HEX(SUBSTRING(messageblk, 1087, 53)) FROM dbmail_messageblks WHERE physmessage_id = 273400 AND is_header = 1; 5375626A6563743A 20 566964656F70726F686C ED 646B61 20 76656C6574726875 20 72796261 F8 656E ED 20 76 20 42726E EC 20 32303037 (added spaces around a \x20 and >\x7F chars) mysql> SELECT SUBSTRING(messageblk, 1087, 53) FROM dbmail_messageblks WHERE physmessage_id = 273400 AND is_header = 1; Subject: Videoprohl?dka veletrhu ryba?en? v Brn? 2007 A001 UID FETCH 554133 (ENVELOPE) * 97 FETCH (UID 554133 ENVELOPE ("Wed, 21 Mar 2007 18:09:41 +0100" "=?UTF-8?q?Videoprohl=C3=ADdka_veletrhu_ryba=C5=99en=C3=AD_?= =?iso-8859-2?q?v_Brn=EC?= 2007" ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "undisclosed-recipients" NIL)) NIL NIL NIL "<[EMAIL PROTECTED]>")) A001 OK UID FETCH completed It seems ok, because UTF(C3 AD) == WIN(ED), UTF(C5 99) == WIN(F8), ISO(EC) = WIN(EC). Do you mean bug is in MSOE mail client? Does MSOE recognize a =?UTF-8?q? prefix? Or mixed UTF8 and ISO 8859-2? I'll attach screenshots of this situation. Red underlining highlites wrong characters and green "correct" (at msoe.jpg you could see of font change from this position to the end of line, incl. 2007 number, but it seems like MSOE bug, squirrel (SquirrelMail 1.4.10 SVN) shows both wrong). (Note for http://www.dbmail.org/mantis/view.php?id=538: I have 2471 revision, default_msg_encoding=utf8.) ---------------------------------------------------------------------- paul - 22-Mar-07 16:56 ---------------------------------------------------------------------- Now why are you using default_msg_encoding=utf8?? Try using windows-1250 since you mentioned that is the charset that's causing the problems. ---------------------------------------------------------------------- idk - 22-Mar-07 23:48 ---------------------------------------------------------------------- Why am I UTF8 as default? You said me :) In bug http://www.dbmail.org/mantis/view.php?id=265 you wrote: ... you do need to change dbmail.conf and add two new entries: encoding=utf8 default_msg_encoding=utf8 So I did it. Nevertheless I have tried to change to WINDOWS-1250 but with the same result. Regardless of default charset there is inconsistency between cached headervalue (dbmail_headervalue.headervalue TEXT utf8_general_ci) and binary content of all headers (dbmail_messageblks.messageblk LONGBLOB BINARY): mysql> SELECT HEX(headervalue) FROM dbmail_headervalue WHERE id = 607434; 566964656F70726F686CC383C2AD646B612076656C65747268752072796261C385E284A2656EC383C2AD20762042726EC384E280BA2032303037 So V 56 i 69 d 64 e 65 o 6F p 70 r 72 o 6F h 68 l 6C i_acute C383C2AD d 64 k 6B a 61 20 v 76 e 65 l 6C e 65 t 74 r 72 h 68 u 75 20 r 72 y 79 b 62 a 61 r_circ C385E284A2 e 65 n 6E i_acute C383C2AD 20 v 76 20 B 42 r 72 n 6E e_circ C384E280BA 20 2 32 0 30 0 30 7 37 ---------------------------------------------------------------------- idk - 23-Mar-07 00:10 ---------------------------------------------------------------------- Ohh, where is rest of my previous comment?? I wrote it about two hours during some tests, I have no backup.... :((( I wrote many infos about step by step to reproduce... Gah... Grrrrh (Probably paste of IMAP result from linux shell window with only \n on the end of line?) So briefly. Two issues. First - maybe double encoding into UTF8 (see i_acute C383C2AD for example, correct value is C3AD), second - multiple encoding in one header value (I have created two another copies of this message and I have replaced 8bit chars by tripple $$$ and reinsert them via sendmail, one copy has first three character replaces (prefixed by UTF/q previously), second one has replaced only last character (prefixed by ISO/q previously), now the first copy is encoded by UTF/b and second one by ISO/q, see bellow). Is there a chance to sent by IMAP in one header value only one encodings? When IMAP send only one encodings (with switching into another and all characters are under one encodings), MSOE as well as SquirrelMail shows hedaer value in list correctly. ---------------------------------------------------------------------- idk - 23-Mar-07 00:10 ---------------------------------------------------------------------- A001 UID FETCH 555258,555260 (ENVELOPE) * 95 FETCH (UID 555258 ENVELOPE ("Wed, 21 Mar 2007 18:09:41 +0100" "=?UTF-8?b?VmlkZW9wcm9obMOtZGthIHZlbGV0cmh1IHJ5YmHFmWVuw60=?= v Brn$$$ UTF 2007" ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "undisclosed-recipients" NIL)) NIL NIL NIL "<[EMAIL PROTECTED]>")) * 96 FETCH (UID 555260 ENVELOPE ("Wed, 21 Mar 2007 18:09:41 +0100" "Videoprohl$$$dka =?iso-8859-2?q?veletrhu_ryba$$$en$$$_v_Brn=EC?= ISO 2007" ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "undisclosed-recipients" NIL)) NIL NIL NIL "<[EMAIL PROTECTED]>")) A001 OK UID FETCH completed ---------------------------------------------------------------------- idk - 23-Mar-07 01:50 ---------------------------------------------------------------------- One more note: I sent two modified messages - first one with 8bit and second one with MIME ISO escaping. In both cases DBMail stores in headervalue same value, bud envelopes are different. The first envelope (with mixed encodings) is showed incorrectly. mysql> SELECT HEX(headervalue) FROM dbmail_headervalue WHERE physmessage_id IN (274354,274353) AND headername_id = 7; C3AD20C599C3AD2078207820782078207820782078207820782078207820782078207820C49B C3AD20C599C3AD2078207820782078207820782078207820782078207820782078207820C49B mysql> SELECT envelope FROM dbmail_envelope WHERE physmessage_id IN (274354,274353); ("Wed, 21 Mar 2007 18:09:41 +0100" "=?UTF-8?b?w60gxZnDrSA=?= =?iso-8859-2?q?x_x_x_x_x_x_x_x_x_x_x_x_x_x_=EC?=" ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "undisclosed-recipients" NIL)) NIL NIL NIL "<[EMAIL PROTECTED]>") ("Wed, 21 Mar 2007 18:09:41 +0100" "=?iso-8859-2?Q?=ED_=F8=ED_x_x_x_x_x_x_x_x_x_x_x_x_x_x_=EC?=" ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "undisclosed-recipients" NIL)) NIL NIL NIL "<[EMAIL PROTECTED]>") ---------------------------------------------------------------------- maximP - 23-Mar-07 08:16 ---------------------------------------------------------------------- Maybe it's better to have 2 fields for header values: utf-8 (for searching) and 7-bit (for FETCH). ---------------------------------------------------------------------- AntonZ - 23-Mar-07 10:49 ---------------------------------------------------------------------- Please attach this message source. Interesting original subject field. g_mime_utils_header_encode_text encode full string. multiple encoding may be created only with g_mime_utils_header_encode_phrase. Issue History Date Modified Username Field Change ====================================================================== 22-Mar-07 11:23 idk New Issue 22-Mar-07 14:42 paul Note Added: 0001935 22-Mar-07 14:42 paul Relationship added related to 0000538 22-Mar-07 16:25 idk Note Added: 0001936 22-Mar-07 16:28 idk File Added: msoe.jpg 22-Mar-07 16:28 idk File Added: squirrel.jpg 22-Mar-07 16:56 paul Note Added: 0001937 22-Mar-07 23:48 idk Note Added: 0001939 23-Mar-07 00:10 idk Note Added: 0001940 23-Mar-07 00:10 idk Note Added: 0001941 23-Mar-07 01:50 idk Note Added: 0001942 23-Mar-07 08:16 maximP Note Added: 0001945 23-Mar-07 10:49 AntonZ Note Added: 0001947 ====================================================================== _______________________________________________ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://twister.fastxs.net/mailman/listinfo/dbmail-dev