On Wed, 22 May 2019 at 21:15, <[1]sf...@users.sourceforge.net> wrote:
Kirill Kolyshkin: > Can you tell me what is your reason to not retry reading by default? The > code > has just checked that this is an aufs mount so it should definitely be > present in > /proc/mounts. Unless, of course, this mount was unmounted by someone in > between statfs() and reading. If you have this exact case in mind (I can't > think > of anything else) and don't want to retry because of efficiency, you can add > another statfs() to after reading /proc/mounts and not finding the mount -- > that way you can be sure that the mount is still there but it eluded the > /proc/mounts. Yes, such race was in my mind. In other words, it is hard to identify the reason why /proc/mounts doesn't show the entry.A The problem of /proc/mounts, or someone else unmounted? The only way to make sure is to retry reading /proc/mounts. Or, do stat+statfs (exactly as you suggested below) to check that this is indeed the aufs root directory (and then you still need to retry reading /proc/mounts). A Additionally I guess(hope) such parallel mount/unmounts are rare.A And I wonder "2" is the absolute correct solution?A "3" cannot be happen? Never? If you're asking about number of retries, I chose 3 in my patch just to be on the safe side. Let's assume that a probability of not finding a specific mount is 0.01, or 1%, then in case of a second retry it will be 0.01^2 = 0.0001, or 0.01%. With three retries, it will be 0.01^3, or 0.0001%.A Practically, I never saw it in my test that second retry won't work, but maybe we can do more testing to figure out what a good number of retries should be. A Statfs(), you say, won't help I am afraid.A Even if it tells us that the dir is aufs, it is not the proof of the aufs mountpoint.A It can be a subdir of another aufs mount. An extra stat(2) call may help in this point.A It will tell us the inode number, and if it is AUFS_ROOT_INO, then that path is the aufs mountpoint. But I wonder do we really have to issue stat(2) and statfs(2) just to make sure the aufs mount is still there?A Isn't it rather heavy and racy? It is racy, not very heavy though, and I suggest to only do it in case we tried to find the mount in /proc/mounts and failed, before retry. Reading /proc/mounts is super heavy, compared to a couple of syscalls, looks like a decent price to pay to avoid re-reading it. A > I have also took a deeper look at that other error I mentioned earlier. > Found out > it's a race in au_xino_create(). In case xino mount option is provided (we > use > xino=/dev/shm/aufs.xino in Docker), and multiple mounts (note: for different > mount points) are performed in parallel, one mount can reach > > >A file = vfsub_filp_open(fpath, O_RDWR | O_CREAT | O_EXCL | O_LARGEFILE, > 0666); > > line of code, while another mount already created that file, but haven't > unlinked it yet. > > As a result, we have an error like these in the kernel log: > > [2233986.956753] aufs au_xino_create:767:dockerd[17144]: open > /dev/shm/aufs.xino(-17) > [2233988.732636] aufs au_xino_create:767:dockerd[17518]: open > /dev/shm/aufs.xino(-17) Thank you very much for the report. Here -17 means EEXIST "File exists" error. It is an expected behaviour (and I am glad that I know it is working expectedly). As you might know, the default path of XINO files are the top dir of the first writable branch, and a writable branch is not sharable between the multiple aufs mounts.A So by default XINO files are dedicated to a single aufs mount. Not shared, no confliction happens. This is correct for read-write mounts, what about read-only ones?A The doc says that if there's no writable branch, path used will be /tmp/.aufs.xino. Sure, aufs kernel module code creates and then removes this file, so another mount will create another file (with the same name), but I see that it is a clear possibility to have a race in this case (such as multiple read-only mounts happening at the same time). A > Currently, I am working around this unfortunate issue by calling mount(2) > under > an exclusive lock, to make sure no two aufs mounts (again, for different > mount > points) are performed in parallel, but perhaps there is a better way? > > I am going to mitigate this race by adding a random suffix to xino file > name; do you think > it is a decent workaround? If your first writable branch is somewhere on /dev/shm, then you can remove "xino=" option.A In this case, the XINO files will be created under /dev/shm and not shared.A Moreover "xino=" option is something like a last resort generally.A As long as the filesystem of your first writable branch doesn't support XINO, or you want a little gain around the aufs internal XINO handling, you may want "xino=".A Otherwise you can omit it. My guess is, whoever wrote docker aufs support, thought that using tmpfs for xino file is faster. Or maybe they saw an advise (from some old documentation, such asA [2]http://aufs.sourceforge.net/aufs2/brsync/README.txt) to never put xino file to an SSD (I'm sure many docker users use ssd hard drivers). A Of course adding a random/unique suffix is a good idea.A If I were you, I'd use $$ in shell script manner such like A A A A mount -t aufs -obr=...,xino=/dev/shm/aufs.xino.$$ ... My code is in Go, so I'm patching it this way - A A A opts := "dio,xino=/dev/shm/aufs.xino" + A A A rnd := strconv.FormatInt(int64(1e9+rand.Uint32()%1e9), 36)[1:5] + A A A opts := "dio,xino=/dev/shm/aufs."+rnd (rnd is some 4 random characters in 0-9a-z range) References 1. mailto:sf...@users.sourceforge.net 2. http://aufs.sourceforge.net/aufs2/brsync/README.txt