On Jun 1, 2009, at 8:36 PM, till wrote:

> And what's the performance trade off to always converting?


It isn't the processing to to do the encoding conversion, it is that  
each message has a regex search to see how it should be converted.
The way I read that code, even if the message is UTF-8, the regex  
will be done to determine the validity of the if statement.

> if ($from == "ISO-8859-1" && preg_match("/[\x80-\x9F]/", $str))
> $from = "WINDOWS-1252";


Maybe there should be a nested if statement, so that only messages  
that marked as ISO-8859-1 are tested for the Black Hole of Windows-1252.

if ($from == "ISO-8859-1")
        if (preg_match("/[\x80-\x9F]/", $str))
                $from = "WINDOWS-1252";

With UTF-8 becoming more common, that would make the regex be skipped  
for likely the bulk of messages.

However, the same problem could occur no matter what the message  
header says the encoding should be. A message that has a UTF-8 header  
could very well have WINDOWS-1252 encoding inside it. The above  
solution works because as the OP said :

> The Windows-1252 character set is effectively a superset of the  
> iso-8859-1
> character set,

Not true of WINDOWS-1252 encoded data marked as, or should I say  
masquerading as, UTF-8 content.

Does RC really want to parse all messages and apply heuristics to  
determine the encoding ?
Yes, this is a relatively simple case, but you open the door for  
other patches to solve other specific encoding mismatches.
We have no numbers as to how often this exact encoding mismatch  
happens other than " I ran into this once. "
No offense to the OP, he provided a simple fix to the problem, but it  
is a very specific problem.

Here's one to fix :
If you subscribe to a mail list run by mailman in plain digest mode,  
it doesn't convert the incoming messages to a consistent encoding, it  
just mashes the original message in its original encoding into the  
digest message that is labeled as 7-bit us-ascii. How does RoundCube  
handle that ? It punts because it is an upstream problem.

BTW, the MIME digest mode of mailman makes each message a separate  
part that is labeled with its own encoding ( but then you get  
attachments to messages, which is sub-optimal for me).


-- 
Charles Dostale
System Admin - Silver Oaks Communications
http://www.silveroaks.com/
824 17th Street, Moline  IL  61265

_______________________________________________
List info: http://lists.roundcube.net/dev/

Reply via email to