On 2007-03-09 00:44:55 +0100, Jim Meyering wrote: > Realize that for most people (everyone except you, afaik), > rm works just fine.
Yes, for most people, rm works fine. But the problem exists (I had it on 3 different NFS servers in the past few years). And for your information, other users have reported the same problem, e.g. http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=994291&admit=-682735245+1173400109463+28353475 where some user posted the problem, and another user replied he had the same problem ("I have a full encapsulated procedure which stops when rm fails. Very annoying problem [...] While my system Is in production it is pretty hard to turn off EMC or nfs server cache, cause it would dracstically impact performances."). See? Other users find this problem annoying, and though there may be a solution on the NFS side (turning off the cache), such a solution is not reasonable. > Please step back a moment and consider whether you have an unusual > NFS setup, since you are the only one to report such a problem. Correction: I'm the only one who has reported it at the right place (well, perhaps not the right place, seeing how this problem is considered here...). It is well know that most users don't report bugs, or report them at a different place, more likely searching for an immediate workaround. This is also my case, sometimes. You can see here the first time I had this problem (this was with GNU fileutils 4.0p, in 2001): http://groups.google.com/group/fr.comp.os.unix/browse_thread/thread/2e526832a2f3947d/ Also note that the problem occurred much more frequently with the coreutils snapshot (6.8+) than with the current Debian version (5.97). And I doubt that many people use the snapshot version. And I'm also one of those who use the machines the most intensively (I'm often the only one to report bugs, but they are sometimes eventually identified and fixed). > You should start by trying to reproduce the failure using stock versions > of client and server kernels, tools, etc. The problem is that as the user, I can't choose. But FYI, the client is a Debian/testing (in fact, because Debian/stable doesn't exist for x86_64), so, not really old. The server is however quite old, but the sysadmins don't want to upgrade it as they are not sure that it will still work... (This is not surprising!) And it will be replaced by a new server under AFS, but this is not for the short term. > Better still, write a script that will demonstrate the problem, > given a small number of inputs (e.g., directory, hostname) and ask > people to run it and report any problem they see. The problem is that it is difficult to reproduce under different conditions, in particular if the number of inputs is small. BTW, I can no longer reproduce the problem with my testcase that was 100% reproducible a few days ago (though under the same conditions on my side, and the machines haven't rebooted). It probably depends on the load of the machine or the network (as very often, when the bug depends on race conditions). > I admit that the "rm skips rmdir" may be technically contrary to POSIX, > but unless there's a more realistic way to trigger misbehavior, then I > won't try to change it. However, if you develop a clean, non-invasive > patch to make rm conform to the letter of POSIX, and add a test script, > I'll consider it. A suggestion concerning the "rm skips rmdir": Consider that ENOENT errors should not block rmdir (and other errors do). Indeed such an "error" doesn't mean that an existing file couldn't be unlinked, just that the file didn't exist. And to implement that, only an additional flag is necessary, isn't it? (But I haven't looked at the coreutils source very deeply). -- Vincent Lefèvre <[EMAIL PROTECTED]> - Web: <http://www.vinc17.org/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/> Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon) _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
