Hello, I may have of found a somewhat obscure race when moving or copying files on an SMP machine.
fileutils version 4.1 and fileutils 4.1.5 (latest: 1/06/2002) Linux kernel version 2.4.7-10 (Redhat 7.2) ext2 filesystem The problem occurs when more than one process is attempting to mv/cp the same file. For example: #mkdir test #cd test #touch a #ls -lia total 8 57 drwxrwxr-x 2 mgaughen mgaughen 4096 Jan 8 14:04 . 142739 drwxrwxr-x 7 mgaughen mgaughen 4096 Jan 8 14:04 .. 59 -rw-rw-r-- 1 mgaughen mgaughen 0 Jan 8 14:04 a #mv a b (Process 1) #mv a b (Process 2) To execute the moves at the _same_ time, on my SMP box, I am using a program called hydra. It basically allows for me to send commands to multiple login sessions at the same time. In all cases, I would expect to see one of the process perform the mv successfully, while the other process fails with this error msg: mv: cannot stat `a': No such file or directory However, when the race occurs, this message is produced instead: mv: `a' and `b' are the same file Executing another 'ls' gives: #ls -lia total 8 57 drwxrwxr-x 2 mgaughen mgaughen 4096 Jan 8 14:04 . 142739 drwxrwxr-x 7 mgaughen mgaughen 4096 Jan 8 14:04 .. File "b" does not exist! After looking through the mv/cp source code, I found that the problem was in copy_internal(). When the race occurs, both process 1 and 2 are able to stat(2) the src_path ("a"). Then, process 1 is able to execute the rename(2) first. Process 2 comes along and also attempts the rename. However, "a" doesn't exist anymore, so rename returns ENOENT. copy_internal() assumes that a cross-device 'mv' is being executed, and proceeds to unlink(2) the dst_path ("b"), followed by a call to copy_reg(). However, the call to open(2), in copy_reg(), fails since "a" still doesn't exist. At that point the message: mv: `a' and `b' are the same file is printed. The message doesn't make sense in this case, because "a" and "b" are _not_ the same file. Part of the problem is the ugly race between stat (path lookups) and rename under Linux (and other OSes?!?) But it seems to me that copy_internal() could be made a bit more robust. If rename returns ENOENT, instead of assuming that a cross-device 'mv' was being attempted (which was not the case), copy_internal() could print an error message and return. Is there a reason why that would be bad? I am not subscribed to bugs-fileutils, so if you could CC me on any replies, that would be great. Comments? Flames? Thanks, -Mike Gaughen _______________________________________________ Bug-fileutils mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-fileutils