A NOTE has been added to this issue. 
====================================================================== 
http://www.dbmail.org/mantis/view.php?id=548 
====================================================================== 
Reported By:                idk
Assigned To:                
====================================================================== 
Project:                    DBMail
Issue ID:                   548
Category:                   IMAP daemon
Reproducibility:            N/A
Severity:                   feature
Priority:                   normal
Status:                     new
target:                      
====================================================================== 
Date Submitted:             22-Mar-07 11:23 CET
Last Modified:              23-Mar-07 10:49 CET
====================================================================== 
Summary:                    WISH: Better parsing 8bit header characters
Description: 
In mail header values there are valid only 7bit characters, so accents
should be escaped. But... Seldom I got message from buggy mail client
which ignore this rule.

MSOE's message list has invalid subject (it seems like UTF8 encodings but
displayed by single byte), but opened message has Subject header displayed
correctly (parsed from headers part of message). So I think it has a
solution.

MSOE under Windows (CZE) has default code page 1250, so this is one option
MSOE interpreted Subject from all message content "correctly", other one is
fetching of Content-Type header value (see Additional Information).

The second option should be applicable for DBMail, I mean.
======================================================================
Relationships       ID      Summary
----------------------------------------------------------------------
related to          0000538 incorrect field cache values for messag...
====================================================================== 

---------------------------------------------------------------------- 
 paul - 22-Mar-07 14:42  
---------------------------------------------------------------------- 
This is exactly how it's done at the moment. 

If a header is 8bit the header string is converted to utf8.
If the content-type header contains a charset specification dbmail will
try to convert from the specified charset to utf8
Else dbmail will fall back to the charset specified in the
DEFAULT_MSG_ENCODING config value and try to convert the string to utf8,
assuming the header was encoded in that charset.
If both fail dbmail will replace all 8 bit characters with '?'. 

---------------------------------------------------------------------- 
 idk - 22-Mar-07 16:25  
---------------------------------------------------------------------- 
mysql> SELECT HEX(SUBSTRING(messageblk, 1087, 53)) FROM dbmail_messageblks
WHERE physmessage_id = 273400 AND is_header = 1;

5375626A6563743A 20 566964656F70726F686C ED 646B61 20 76656C6574726875 20
72796261 F8 656E ED 20 76 20 42726E EC 20 32303037

(added spaces around a \x20 and >\x7F chars)

mysql> SELECT SUBSTRING(messageblk, 1087, 53) FROM dbmail_messageblks
WHERE physmessage_id = 273400 AND is_header = 1;

Subject: Videoprohl?dka veletrhu ryba?en? v Brn? 2007

A001 UID FETCH 554133 (ENVELOPE)
* 97 FETCH (UID 554133 ENVELOPE ("Wed, 21 Mar 2007 18:09:41 +0100"
"=?UTF-8?q?Videoprohl=C3=ADdka_veletrhu_ryba=C5=99en=C3=AD_?=
=?iso-8859-2?q?v_Brn=EC?= 2007" ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL
"chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL
"undisclosed-recipients" NIL)) NIL NIL NIL
"<[EMAIL PROTECTED]>"))
A001 OK UID FETCH completed


It seems ok, because UTF(C3 AD) == WIN(ED), UTF(C5 99) == WIN(F8), ISO(EC)
= WIN(EC). Do you mean bug is in MSOE mail client? Does MSOE recognize a
=?UTF-8?q? prefix? Or mixed UTF8 and ISO 8859-2?

I'll attach screenshots of this situation. Red underlining highlites wrong
characters and green "correct" (at msoe.jpg you could see of font change
from this position to the end of line, incl. 2007 number, but it seems
like MSOE bug, squirrel (SquirrelMail 1.4.10 SVN) shows both wrong).

(Note for http://www.dbmail.org/mantis/view.php?id=538: I have 2471 revision,
default_msg_encoding=utf8.) 

---------------------------------------------------------------------- 
 paul - 22-Mar-07 16:56  
---------------------------------------------------------------------- 
Now why are you using default_msg_encoding=utf8?? Try using windows-1250
since you mentioned that is the charset that's causing the problems. 

---------------------------------------------------------------------- 
 idk - 22-Mar-07 23:48  
---------------------------------------------------------------------- 
Why am I UTF8 as default? You said me :) In bug
http://www.dbmail.org/mantis/view.php?id=265 you wrote:

... you do need to change dbmail.conf and add two new entries:

encoding=utf8
default_msg_encoding=utf8

So I did it.

Nevertheless I have tried to change to WINDOWS-1250 but with the same
result.

Regardless of default charset there is inconsistency between cached
headervalue (dbmail_headervalue.headervalue TEXT utf8_general_ci) and
binary content of all headers (dbmail_messageblks.messageblk LONGBLOB
BINARY):

mysql> SELECT HEX(headervalue) FROM dbmail_headervalue WHERE id = 607434;

566964656F70726F686CC383C2AD646B612076656C65747268752072796261C385E284A2656EC383C2AD20762042726EC384E280BA2032303037

So V 56 i 69 d 64 e 65 o 6F p 70 r 72 o 6F h 68 l 6C i_acute C383C2AD d 64
k 6B a 61 20 v 76 e 65 l 6C e 65 t 74 r 72 h 68 u 75 20 r 72 y 79 b 62 a 61
r_circ C385E284A2 e 65 n 6E i_acute C383C2AD 20 v 76 20 B 42 r 72 n 6E
e_circ C384E280BA 20 2 32 0 30 0 30 7 37 

---------------------------------------------------------------------- 
 idk - 23-Mar-07 00:10  
---------------------------------------------------------------------- 
Ohh, where is rest of my previous comment?? I wrote it about two hours
during some tests, I have no backup.... :(((

I wrote many infos about step by step to reproduce... Gah... Grrrrh

(Probably paste of IMAP result from linux shell window with only \n on the
end of line?)

So briefly. Two issues. First - maybe double encoding into UTF8 (see
i_acute C383C2AD for example, correct value is C3AD), second - multiple
encoding in one header value (I have created two another copies of this
message and I have replaced 8bit chars by tripple $$$ and reinsert them
via sendmail, one copy has first three character replaces (prefixed by
UTF/q previously), second one has replaced only last character (prefixed
by ISO/q previously), now the first copy is encoded by UTF/b and second
one by ISO/q, see bellow).

Is there a chance to sent by IMAP in one header value only one encodings?
When IMAP send only one encodings (with switching into another and all
characters are under one encodings), MSOE as well as SquirrelMail shows
hedaer value in list correctly. 

---------------------------------------------------------------------- 
 idk - 23-Mar-07 00:10  
---------------------------------------------------------------------- 
A001 UID FETCH 555258,555260 (ENVELOPE)
* 95 FETCH (UID 555258 ENVELOPE ("Wed, 21 Mar 2007 18:09:41 +0100"
"=?UTF-8?b?VmlkZW9wcm9obMOtZGthIHZlbGV0cmh1IHJ5YmHFmWVuw60=?= v Brn$$$ UTF
2007" ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz"))
((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "undisclosed-recipients" NIL))
NIL NIL NIL "<[EMAIL PROTECTED]>"))
* 96 FETCH (UID 555260 ENVELOPE ("Wed, 21 Mar 2007 18:09:41 +0100"
"Videoprohl$$$dka =?iso-8859-2?q?veletrhu_ryba$$$en$$$_v_Brn=EC?= ISO
2007" ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz"))
((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "undisclosed-recipients" NIL))
NIL NIL NIL "<[EMAIL PROTECTED]>"))
A001 OK UID FETCH completed 

---------------------------------------------------------------------- 
 idk - 23-Mar-07 01:50  
---------------------------------------------------------------------- 
One more note: I sent two modified messages - first one with 8bit and
second one with MIME ISO escaping. In both cases DBMail stores in
headervalue same value, bud envelopes are different. The first envelope
(with mixed encodings) is showed incorrectly.

mysql> SELECT HEX(headervalue) FROM dbmail_headervalue WHERE
physmessage_id IN (274354,274353) AND headername_id = 7;
C3AD20C599C3AD2078207820782078207820782078207820782078207820782078207820C49B
C3AD20C599C3AD2078207820782078207820782078207820782078207820782078207820C49B

mysql> SELECT envelope FROM dbmail_envelope WHERE physmessage_id IN
(274354,274353);

("Wed, 21 Mar 2007 18:09:41 +0100" "=?UTF-8?b?w60gxZnDrSA=?=
=?iso-8859-2?q?x_x_x_x_x_x_x_x_x_x_x_x_x_x_=EC?=" ((NIL NIL "chytej"
"chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej"
"chytej.cz")) ((NIL NIL "undisclosed-recipients" NIL)) NIL NIL NIL
"<[EMAIL PROTECTED]>")

("Wed, 21 Mar 2007 18:09:41 +0100"
"=?iso-8859-2?Q?=ED_=F8=ED_x_x_x_x_x_x_x_x_x_x_x_x_x_x_=EC?=" ((NIL NIL
"chytej" "chytej.cz")) ((NIL NIL "chytej" "chytej.cz")) ((NIL NIL "chytej"
"chytej.cz")) ((NIL NIL "undisclosed-recipients" NIL)) NIL NIL NIL
"<[EMAIL PROTECTED]>") 

---------------------------------------------------------------------- 
 maximP - 23-Mar-07 08:16  
---------------------------------------------------------------------- 
Maybe it's better to have 2 fields for header values: utf-8 (for searching)
and 7-bit (for FETCH). 

---------------------------------------------------------------------- 
 AntonZ - 23-Mar-07 10:49  
---------------------------------------------------------------------- 
Please attach this message source. Interesting original subject field.
g_mime_utils_header_encode_text encode full string. multiple encoding may
be created only with g_mime_utils_header_encode_phrase. 

Issue History 
Date Modified   Username       Field                    Change               
====================================================================== 
22-Mar-07 11:23 idk            New Issue                                    
22-Mar-07 14:42 paul           Note Added: 0001935                          
22-Mar-07 14:42 paul           Relationship added       related to 0000538  
22-Mar-07 16:25 idk            Note Added: 0001936                          
22-Mar-07 16:28 idk            File Added: msoe.jpg                         
22-Mar-07 16:28 idk            File Added: squirrel.jpg                     
22-Mar-07 16:56 paul           Note Added: 0001937                          
22-Mar-07 23:48 idk            Note Added: 0001939                          
23-Mar-07 00:10 idk            Note Added: 0001940                          
23-Mar-07 00:10 idk            Note Added: 0001941                          
23-Mar-07 01:50 idk            Note Added: 0001942                          
23-Mar-07 08:16 maximP         Note Added: 0001945                          
23-Mar-07 10:49 AntonZ         Note Added: 0001947                          
======================================================================

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://twister.fastxs.net/mailman/listinfo/dbmail-dev

Reply via email to