On Thu, May 29, 2014 at 11:27:07PM +0200, Harald Becker wrote:
> Hi Rich !
> 
> >> I know this problem very well. It happens about every few
> >> month, that I get a ZIP packaged file from a Windows system.
> >> As the maintainer is a bit stupid, he can't manage to avoid
> >> foreign characters and I end up with unusual file names after
> >> unzip.
> >
> >This sounds like a bug in the unzip utility. If it encounters
> >byte sequences which are not UTF-8, it should convert them from
> >whatever legacy encoding they're in to UTF-8, possibly issuing
> >an error that the user needs to specify this encoding if it
> >can't be determined.
> 
> Then you need to consider all programs buggy which don't
> mangle with the file names. There are so many programs which just
> copy filenames through and let the kernel decide what to do. And
> I do not mean BB unzip here, normally I'm using the upstream
> unzip.
> 
> .... and how can you consider all names being UTF-8 ... nowadays
> may be, but what when using 8 bit locales with different
> charsets? UTF-8 mangling would be wrong on those.

My statement was imprecise; of course to support users still stuck on
legacy locales, nl_langinfo(CODESET) should be consulted.

> .... and not only unzip may produce such results. Think of using
> an USB stick at an Windows machine, then carry that over to an
> Linux machine.

The filenames are stored in UCS-2. No problem.

> Depending on how the file system is mounted you
> may get unusual file names when copying names with foreign
> characters. Now who is bad?

If you mount it incorrectly, then this is user error. Note that
correct versus incorrect does not depend on the contents of the
storage device, only the encoding the local system where you're
mounting it is using.

> Would be nice to have them all fixed ... get them all fixed the
> same way when doing some mapping ... but can that ever reach all
> programs? This is a so long standing problem, nobody really
> cares. 

All programs are not affected. Only programs which read filenames as
byte strings from foreign sources (such as the directory table of a
zip file) are affected.

Rich
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to