Re: local chars displayed as numbers

Reiner Steib Sat, 23 Sep 2006 04:35:04 -0700

On Sat, Sep 23 2006, Jason Rumney wrote:

> Kenichi Handa wrote:
>> At least windows-1252 doesn't cover all eight-bit bytes.
>> There are a few invalid bytes: 0x81, 0x8c, 0x8e...
>>   
> 0x8c is "Latin capital ligature Oe", and 0x8e is "Latin capital letter Z with
> caron" according to Windows XP character map. 0x8d is missing, as is 0x90
> (nbsp in latin-1). I'm not sure if the latter is just filtered out from
> display in character map though (0x20 space is also not displayed).


NO-BREAK SPACE is A0 in both, Latin-1 and windows-1252 (all characters
present in Latin-1 are also in windows-1252 at the same position;
i.e. windows-1252 is a superset of Latin-1).

,----[ http://en.wikipedia.org/wiki/Windows-1252 ]
| According to the information on Microsoft's and the Unicode
| Consortium's websites positions 81, 8D, 8F, 90, and 9D are
| unused. However the Windows API call for converting from code pages
| to Unicode maps these to the corresponding C1 control codes. The
| euro character at position 80 was not present in earlier versions of
| this code page, nor were the S and Z with caron (háček)
`----

While I don't know if these five positions (81, 8D, 8F, 90, and 9D)
are sufficient to distinguish raw-text from windows-1252, together
with Eli's suggestion (detect null bytes) it might give good results.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/


_______________________________________________
emacs-pretest-bug mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug

Re: local chars displayed as numbers

Reply via email to