Re: UTF problems (bugs)

Thomas Bruederli Tue, 14 Mar 2006 07:34:52 -0800

Håkan Lindqvist wrote:
> On tis, 2006-03-14 at 16:05 +0100, Thomas Bruederli wrote:
>> If a message specifies it's charset in the Content-Type header, RC will
>> attempt to convert it to UTF-8. This does not work for HTML messages
>> that have chars encoded with html entities. A decoding function handling
>> html entities has to be written for that. Anyone?
> 
> But aren't HTML entities already charset agnostic?!


I guess they aren't. An entity like &#252; represents a single byte char
(ASCII 252; "ü" in ISO-8859-1). As far as I know the browser will not
display this entity correctly because it expects double-byte characters.

Please correct me if I'm wrong...
> 
> Do really HTML entities have to be transformed in any way?
> 
> (Sorry that I jump into this discussion without fully reading up on what
> has been said before, but it just sounds so weird to me.)
> 
> 
> /Håkan

~Thomas

Re: UTF problems (bugs)

Reply via email to