Werner Lemberg writes: > Can you provide a complete list of encodings supported on XEmacs (the > latest version preferred)? I would like to mark them correctly in my > table for reference purposes.
XEmacs 21.5.24 appears to have the following coding-systems (excluding useless iso-2022 variants): iso-8859-1 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 = greek-iso-8bit iso-8859-8 iso-8859-8-e iso-8859-9 iso-8859-15 iso-8859-16 koi8-r alternativnyj gb2312 = cn-gb-2312 = chinese-euc hz = hz-gb-2312 big5 = cn-big5 iso-2022-jp = junet iso-2022-jp-1978-irv = old-jis iso-2022-jp-2 jis7 jis8 euc-jp = euc-japan = japanese-euc shift_jis = shift-jis iso-2022-int-1 euc-kr = euc-korea iso-2022-kr = korean-iso-7bit-lock tis-620 = tis620 = th-tis620 = thai-tis620 tibetan = tibetan-iso-8bit viscii = vietnamese-viscii vscii = vietnamese-vscii viqr = vietnamese-viqr devanagari = in-is13194-devanagari lao windows-037 windows-437 windows-500 windows-708 windows-709 windows-710 windows-720 windows-737 windows-775 windows-850 windows-852 windows-855 windows-857 windows-860 windows-861 windows-862 windows-863 windows-864 windows-865 windows-866 windows-869 windows-874 windows-875 windows-932 windows-936 windows-949 windows-950 windows-1026 windows-1200 windows-1250 windows-1251 windows-1252 windows-1253 windows-1254 windows-1255 windows-1256 windows-1257 windows-1258 windows-1361 windows-10000 windows-10001 windows-10006 windows-10007 windows-10029 windows-10079 windows-10081 > > - EUC-JISX0213 and Shift_JISX0213 are supported by glibc and > > libiconv nowadays. You can add them to the table. > > I suppose those encodings exist on XEmacs, right? These encodings are not built-in in XEmacs, rather they come as a Mule-UCS add-on. > > - In BOM_table, I would not comment out the little-endian UTF-32 > > BOM. It is the only way to prevent misinterpreting a file in > > little-endian UTF-32 as little-endian UTF-16. You have to trust > > that the input file will not have NUL characters. > > Well, it's actually not necessary to make a difference: The `extract' > method of the groff's `string' class removes all null bytes before > passing the data to the function which tests for the coding tag -- > note that `check_encoding_tag' is called before `iconv'. You are confusing me now, because check_encoding_tag is looking for a "-*- ... -*-" line - which is actually useless if a manual page were to be encoded in a UTF-16 or UTF-32 encoding. It is even more confusing to see how the result of get_BOM is used: get_BOM splits the input into 'BOM' and 'data', and then later they are pasted together again, without looking at the 'BOM' value. The way I would implement it, if a BOM has been found that indicates a particular UTF-8/16/32 variant, it would set the value of the 'encoding' variable, without even calling 'check_encoding_tag'. Because if you find an UTF-16 encoded file that carries a "-*- coding: iso-8859-15 -*-" line, the encoding is really UTF-16. Bruno _______________________________________________ Groff mailing list [email protected] http://lists.gnu.org/mailman/listinfo/groff
