On 7/27/05, Moshe Kaminsky <[EMAIL PROTECTED]> wrote:
> Hi,
> * Fernando Canizo <[EMAIL PROTECTED]> [27/07/05 14:14]:
<snip>
> > I investigate what was in the archives, so i saved a copy (using 'C'
> > command from mutt) of the first message (the one i receive from me)
> > and file says: 'UTF-8 Unicode mail text', check what's inside with
> > hexedit and see that LATIN SMALL LETTER A WITH ACUTE is encoded with
> > this hex: C3 A1 (which is not 00 E1 from unicode chart from
> > http://www.unicode.org/charts/)
> 
> I think this is just the way these characters are represented in utf-8.

Yes, it is.

00E1 hex is '0000000 11100001' in binary.

When encoding this as UTF-8 this value is stored in two bytes.

The last byte will begin with '10' followed by the last 6 bits of data. 

'10 100001' binary or 'A1' in hex.

The first byte will begin with '110' to indicate that it is a two byte 
character followed by the remaining significant data. 

'110 00011' binary or 'C3' hex.

This is correct.

The problem seem to be that mutt(?) takes this UTF-8 encoded data
and encodes as UTF-8 again as if the data was two 8 bit characters.
 
'C3' then becomes 'C3 83' and 'A1' becomes 'C2 A1' 


/Andreas

-- 
gentoo-user@gentoo.org mailing list

Reply via email to