I'm certainly not aware of a method of automatically detecting which
8-bit character set was used.  However, one solution might be to put a
conversion library into zip utilities that could optionally convert file
names between character sets.  Just feeding the file names and nothing
else to libiconv could accomplish that.

----- Original Message -----
From: Danilo Segan <[EMAIL PROTECTED]>
Date: Thursday, June 2, 2005 4:07 pm
Subject: Re: How to detect the encoding of a string?

> Hi Simos,
> 
> It's completely impossible to detect which of the 8-bit encodings is
> used without any further knowledge (for instance, of the language in
> use).  
> 
> To be able to actually decide for one of the many 8-bit encodings
> suitable for a language, one would also need to know language 
> properties (such as frequency of each of letter in it), but it's still
> unlikely that it would work for as short strings as filenames are.
> 
> If you need a formal proof of "undetectability", here's one:
> - valid ISO-8859-1 string is always completely valid ISO-8859-2 (or
> -4, -5) string (they occupy exactly the same spots 0xa1-0xff),
> e.g. you can *never* determine if some character not present in
> another set is actually used.
> 
> Today at 20:16, Simos Xenitellis wrote:
> 
> > P.S.
> > If you would like to experiment with your own ZIP application,
> > try
> > http://www.thranio.gr/sxolikes-
> giortes/telikes/omilies/apoxairetisthrio-logos-mathith.zip
> > The filename is encoded in CP737 (a la iconv). All open-source ZIP
> > tools (=unzip, file-roller, ark) fail to detect the encoding.
> > WinZip is able to detect the encoding.
> 
> My guess is that WinZip is running on a Greek Windows, and that
> WinZip uses old IBM encodings for i18n names on them, assuming CP737
> on Greek system.
> 
> Can you confirm or dispute my assumption (by eg. trying on a non-Greek
> Windows system, or just confirming that this was actually attempted on
> a non-Greek system)?
> 
> Cheers,
> Danilo
> 
> --
> Linux-UTF8:   i18n of Linux on all levels
> Archive:      http://mail.nl.linux.org/linux-utf8/
> 
> 

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to