Hi Simos, It's completely impossible to detect which of the 8-bit encodings is used without any further knowledge (for instance, of the language in use).
To be able to actually decide for one of the many 8-bit encodings suitable for a language, one would also need to know language properties (such as frequency of each of letter in it), but it's still unlikely that it would work for as short strings as filenames are. If you need a formal proof of "undetectability", here's one: - valid ISO-8859-1 string is always completely valid ISO-8859-2 (or -4, -5) string (they occupy exactly the same spots 0xa1-0xff), e.g. you can *never* determine if some character not present in another set is actually used. Today at 20:16, Simos Xenitellis wrote: > P.S. > If you would like to experiment with your own ZIP application, > try > http://www.thranio.gr/sxolikes-giortes/telikes/omilies/apoxairetisthrio-logos-mathith.zip > The filename is encoded in CP737 (a la iconv). All open-source ZIP > tools (=unzip, file-roller, ark) fail to detect the encoding. > WinZip is able to detect the encoding. My guess is that WinZip is running on a Greek Windows, and that WinZip uses old IBM encodings for i18n names on them, assuming CP737 on Greek system. Can you confirm or dispute my assumption (by eg. trying on a non-Greek Windows system, or just confirming that this was actually attempted on a non-Greek system)? Cheers, Danilo -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
