On Sat, Sep 23 2006, Jason Rumney wrote: > Kenichi Handa wrote: >> At least windows-1252 doesn't cover all eight-bit bytes. >> There are a few invalid bytes: 0x81, 0x8c, 0x8e... >> > 0x8c is "Latin capital ligature Oe", and 0x8e is "Latin capital letter Z with > caron" according to Windows XP character map. 0x8d is missing, as is 0x90 > (nbsp in latin-1). I'm not sure if the latter is just filtered out from > display in character map though (0x20 space is also not displayed).
NO-BREAK SPACE is A0 in both, Latin-1 and windows-1252 (all characters present in Latin-1 are also in windows-1252 at the same position; i.e. windows-1252 is a superset of Latin-1). ,----[ http://en.wikipedia.org/wiki/Windows-1252 ] | According to the information on Microsoft's and the Unicode | Consortium's websites positions 81, 8D, 8F, 90, and 9D are | unused. However the Windows API call for converting from code pages | to Unicode maps these to the corresponding C1 control codes. The | euro character at position 80 was not present in earlier versions of | this code page, nor were the S and Z with caron (háček) `---- While I don't know if these five positions (81, 8D, 8F, 90, and 9D) are sufficient to distinguish raw-text from windows-1252, together with Eli's suggestion (detect null bytes) it might give good results. Bye, Reiner. -- ,,, (o o) ---ooO-(_)-Ooo--- | PGP key available | http://rsteib.home.pages.de/ _______________________________________________ emacs-pretest-bug mailing list [email protected] http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
