I'm certainly not aware of a method of automatically detecting which 8-bit character set was used. However, one solution might be to put a conversion library into zip utilities that could optionally convert file names between character sets. Just feeding the file names and nothing else to libiconv could accomplish that.
----- Original Message ----- From: Danilo Segan <[EMAIL PROTECTED]> Date: Thursday, June 2, 2005 4:07 pm Subject: Re: How to detect the encoding of a string? > Hi Simos, > > It's completely impossible to detect which of the 8-bit encodings is > used without any further knowledge (for instance, of the language in > use). > > To be able to actually decide for one of the many 8-bit encodings > suitable for a language, one would also need to know language > properties (such as frequency of each of letter in it), but it's still > unlikely that it would work for as short strings as filenames are. > > If you need a formal proof of "undetectability", here's one: > - valid ISO-8859-1 string is always completely valid ISO-8859-2 (or > -4, -5) string (they occupy exactly the same spots 0xa1-0xff), > e.g. you can *never* determine if some character not present in > another set is actually used. > > Today at 20:16, Simos Xenitellis wrote: > > > P.S. > > If you would like to experiment with your own ZIP application, > > try > > http://www.thranio.gr/sxolikes- > giortes/telikes/omilies/apoxairetisthrio-logos-mathith.zip > > The filename is encoded in CP737 (a la iconv). All open-source ZIP > > tools (=unzip, file-roller, ark) fail to detect the encoding. > > WinZip is able to detect the encoding. > > My guess is that WinZip is running on a Greek Windows, and that > WinZip uses old IBM encodings for i18n names on them, assuming CP737 > on Greek system. > > Can you confirm or dispute my assumption (by eg. trying on a non-Greek > Windows system, or just confirming that this was actually attempted on > a non-Greek system)? > > Cheers, > Danilo > > -- > Linux-UTF8: i18n of Linux on all levels > Archive: http://mail.nl.linux.org/linux-utf8/ > > -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
