Hi All,
The ZIP format (http://www.info-zip.org/pub/infozip/doc/) appears not to specify the text encoding of the filenames of the compressed files, which causes a problem with unzip utilities when they try
to uncompress .ZIP files that include filenames in non-UTF-8 encodings.

Such ZIP programs are "unzip", "file-roller" (GNOME, at http://fileroller.sourceforge.net/), "ark" (KDE) cannot guess the encoding of the filenames and automatically convert to UTF-8.

To solve this problem, a "workaround" is to be able to detect the encoding and automagically convert to UTF-8.

Is there a library or sample program that can do such a "encoding detection" based on short strings of unknown encoding
(or to choose from encodings based on a smaller list than "iconv --list")?

It would be good to have something common to solve the problem for at least file-roller and ark,
which are based on graphical interfaces.

Any suggestions?

Simos

P.S.
If you would like to experiment with your own ZIP application,
try http://www.thranio.gr/sxolikes-giortes/telikes/omilies/apoxairetisthrio-logos-mathith.zip The filename is encoded in CP737 (a la iconv). All open-source ZIP tools (=unzip, file-roller, ark) fail to detect the encoding.
WinZip is able to detect the encoding.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to