-----Original Message----- From: Peter B. West [mailto:[EMAIL PROTECTED]
Hi Peter,
  is A0 in hex; 1100 0000 in binary.
Slight adjustment... 160 = 2^7 + 2^5, so 1010 0000? (Fresh out of bed, ay? ;-))
Precisely. And I'm not good in the morning.
I was thinking along the same lines, but haven't discovered the mapping yet...
When this value is represented in UTF-8, it becomes the two-byte sequence
How exactly?
If you are on a linux system, man utf-8 explains it well. The Unicode manual has an appendix on transformations, which is probably available online.
This is from the man page.
ENCODING
The following byte sequences are used to represent a character. The
sequence to be used depends on the UCS code number of the character:
0x00000000 - 0x0000007F: 0xxxxxxx
0x00000080 - 0x000007FF: 110xxxxx 10xxxxxx
0x00000800 - 0x0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
0x00010000 - 0x001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
0x00200000 - 0x03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
0x04000000 - 0x7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
It's a beautiful encoding. A0 (0101 0000) uses the two byte form.
So,
110xxxxx 10xxxxxx
where the first 5 bits are 000 + the top two bits of the original, i.e. 10, and the remaining 6 bits are the lower 6 bits of the original, i.e.
10 0000,
becomes
110.000.10 10.10 0000
1100 0010 1010 0000
C2 A0
So the result in iso 8859-1 is Â+  where the fllowing A0 will be interpreted as an ordinary space, I believe.
Peter -- Peter B. West <http://cv.pbw.id.au/> Folio <http://defoe.sourceforge.net/folio/> <http://folio.bkbits.net/>
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]