Hi, after I switched to an SMP system now, I sporadically got a broken /etc/mtab again. I unanalyzed the actual code and came down to lock_mtab() within mount/fstab.c. This function was patched/improved/repaired over and over again and still today each distribution supplies its own patches to get it fixed.
After some days of analyzing it, I came to the conclusion, that there is a principal problem to get the locking work safely with the current concept (or I do not understand how it is supposed to work, and someone can explain it to me). The strategy seems to place an F_WRLCK on mtab~ to proceed modifying mtab itself. If no current ~mtab was found, a new one is created and finally removed again. Thus the creation/deletion are independent of the flock itself. I think this is the root of all problems: either the placement of the flock is the semaphore; in this case the lock file should never be deleted. Or the creation/existence of the lockfile is the semaphore. In this case the flock helps to notify waiting mount processes, but it is not the semaphore itself. There was a major patch around util-linux-2.9i to improve this, but besides a lot of very confusing if/else/while(ntry++<5) constructs, it does not solve the problem in principal. In addition there are a couple of patches around to handle additional races, but they are making things even more complicated. This broken concept also went into other packages i.e. mount.nfs. For all solutions currently around, the possibility remains, to place an flock on a foreign mtab~ file, which may get removed by someone else before the update if mtab itself was finished :-( A first fix would be to prevent other processes from placing an flock on a newly created mtab~. This can be achieved by placing the flock directly on the "linktargetfile" before the hardlink was created. The flock should be inherited by the link and there is no reason to close and reopen the file again. Thus I can (and must) hold the flock until I finished modifying mtab and finally deleted the mtab~ again. The problematic case arises, if some mtab~ already exists: If I place an F_SETLKW and succeed, I assume some other mount process has finished. However, I may still find an mtab~ for several reasons, because: a) the mtab~ file was not locked at all, because it was an old bogus mtab~ accidentally left off, of b) it was created by some other tool, which does not perform locking the same way we expect to (i.e. umount.cifs or amd). Or c) the mtab~ was properly released and deleted, but someone else was faster than me and created a new mtab~ file before I woke up. One may think of just deleting the file in case a) but it again opens a race: If two processes came to the same conclusion, the first one really deletes the bogus file, while the second one may accidentally delete the newly created mtab~ of the first process. May be it is saver never ever to delete any foreign mtab~ which was not created by ourself. Instead it should be treated as a fatal error, if it does not vanish after a some timeout. I hope I may propose a better lock_mtab() solution, soon, after I found some time to test it. Regards, Dieter. - To unsubscribe from this list: send the line "unsubscribe util-linux-ng" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
