But why is ls able to match the files when rm is not able to remove them?
Is it perhaps because ls is not actually doing any operations on the files
themselves (not even a stat?), and just reporting the dirent->d_name strings
that it got from readdir()? In which case "ls -l *" would fail on the same
files even when "ls *" doesn't?
Or is there something deeper whereby stat() succeeds but unlink() fails?
On 2014-05-29 18:32, Rich Felker wrote:
On Thu, May 29, 2014 at 11:27:07PM +0200, Harald Becker wrote:
Hi Rich !
I know this problem very well. It happens about every few
month, that I get a ZIP packaged file from a Windows system.
As the maintainer is a bit stupid, he can't manage to avoid
foreign characters and I end up with unusual file names after
unzip.
This sounds like a bug in the unzip utility. If it encounters
byte sequences which are not UTF-8, it should convert them from
whatever legacy encoding they're in to UTF-8, possibly issuing
an error that the user needs to specify this encoding if it
can't be determined.
Then you need to consider all programs buggy which don't
mangle with the file names. There are so many programs which just
copy filenames through and let the kernel decide what to do. And
I do not mean BB unzip here, normally I'm using the upstream
unzip.
.... and how can you consider all names being UTF-8 ... nowadays
may be, but what when using 8 bit locales with different
charsets? UTF-8 mangling would be wrong on those.
My statement was imprecise; of course to support users still stuck on
legacy locales, nl_langinfo(CODESET) should be consulted.
.... and not only unzip may produce such results. Think of using
an USB stick at an Windows machine, then carry that over to an
Linux machine.
The filenames are stored in UCS-2. No problem.
Depending on how the file system is mounted you
may get unusual file names when copying names with foreign
characters. Now who is bad?
If you mount it incorrectly, then this is user error. Note that
correct versus incorrect does not depend on the contents of the
storage device, only the encoding the local system where you're
mounting it is using.
Would be nice to have them all fixed ... get them all fixed the
same way when doing some mapping ... but can that ever reach all
programs? This is a so long standing problem, nobody really
cares.
All programs are not affected. Only programs which read filenames as
byte strings from foreign sources (such as the directory table of a
zip file) are affected.
Rich
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox
--
"'tis an ill wind that blows no minds."
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox