Hi All,
The ZIP format (http://www.info-zip.org/pub/infozip/doc/) appears not to
specify the text encoding
of the filenames of the compressed files, which causes a problem with
unzip utilities when they try
to uncompress .ZIP files that include filenames in non-UTF-8 encodings.
Such ZIP programs are "unzip", "file-roller" (GNOME, at
http://fileroller.sourceforge.net/), "ark" (KDE)
cannot guess the encoding of the filenames and automatically convert to
UTF-8.
To solve this problem, a "workaround" is to be able to detect the
encoding and automagically convert to UTF-8.
Is there a library or sample program that can do such a "encoding
detection" based on short strings of unknown encoding
(or to choose from encodings based on a smaller list than "iconv --list")?
It would be good to have something common to solve the problem for at
least file-roller and ark,
which are based on graphical interfaces.
Any suggestions?
Simos
P.S.
If you would like to experiment with your own ZIP application,
try
http://www.thranio.gr/sxolikes-giortes/telikes/omilies/apoxairetisthrio-logos-mathith.zip
The filename is encoded in CP737 (a la iconv). All open-source ZIP tools
(=unzip, file-roller, ark) fail to detect the encoding.
WinZip is able to detect the encoding.
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/