How to detect the encoding of a string?

Simos Xenitellis Thu, 02 Jun 2005 11:17:53 -0700


Hi All,

The ZIP format (http://www.info-zip.org/pub/infozip/doc/) appears not tospecify the text encodingof the filenames of the compressed files, which causes a problem withunzip utilities when they try

to uncompress .ZIP files that include filenames in non-UTF-8 encodings.

Such ZIP programs are "unzip", "file-roller" (GNOME, athttp://fileroller.sourceforge.net/), "ark" (KDE)cannot guess the encoding of the filenames and automatically convert toUTF-8.

To solve this problem, a "workaround" is to be able to detect theencoding and automagically convert to UTF-8.

Is there a library or sample program that can do such a "encodingdetection" based on short strings of unknown encoding

(or to choose from encodings based on a smaller list than "iconv --list")?

It would be good to have something common to solve the problem for atleast file-roller and ark,

which are based on graphical interfaces.

Any suggestions?

Simos

P.S.
If you would like to experiment with your own ZIP application,

tryhttp://www.thranio.gr/sxolikes-giortes/telikes/omilies/apoxairetisthrio-logos-mathith.zipThe filename is encoded in CP737 (a la iconv). All open-source ZIP tools(=unzip, file-roller, ark) fail to detect the encoding.

WinZip is able to detect the encoding.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

How to detect the encoding of a string?

Reply via email to