Package: unzip
Version: 6.0-8
Severity: normal

I often receive zip files with japanese (or other cjk) filenames encoded
in a variety of formats, usually gbk or cp932.

With older versions of unzip (5.x), I could specify the encoding of the
filenames, which wasn't optimal, but let me recover filenames (either
by specifying the correct encoding, or using latin1 to get the "raw"
filenames and converting myself).

Unzip 6.x seems to mangle all these filenames into mojibake, and there
doesn't seem to be any option to switch this off.

Here is an example zip file, created with minizip in a locale using latin1
encoding:

http://data.plan9.de/gbk.zip

It contains a directory that happens to be the gbk-encoded filename
"写真集", hex "d0b4d5e6bcaf", which also happens to be a valid latin1
filename.

The expectation for unzip would be to create a directory with that exact
name, regardless of the gbk encoding, because the filename is also a valid
latin1 filename.

minizip and 7z both create this filename, but unzip mangles it to: 
"f0a669b52bbb" (hex).

neither -U, nor -UU, nor -L seem to have any effect on this.

In fact, it seems unzip mangles all filenames containing filenames
containing >127 codes that aren't valid utf-8 now.

-- System Information:
Debian Release: wheezy/sid
  APT prefers stable
  APT policy: (990, 'stable'), (500, 'unstable'), (500, 'testing'), (1, 
'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages unzip depends on:
ii  libbz2-1.0                    1.0.6-4    high-quality block-sorting file co
ii  libc6                         2.13-37    Embedded GNU C Library: Shared lib

unzip recommends no packages.

Versions of packages unzip suggests:
ii  zip                           3.0-3      Archiver for .zip files

-- no debconf information


-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to