Re: [PATCH] au_proc_getmntent: optimize reading, retry

Kirill Kolyshkin Wed, 22 May 2019 23:58:05 -0700

   On Wed, 22 May 2019 at 21:15, <[1]sf...@users.sourceforge.net> wrote:


     Kirill Kolyshkin:
     > Can you tell me what is your reason to not retry reading by
     default? The
     > code
     > has just checked that this is an aufs mount so it should
     definitely be
     > present in
     > /proc/mounts. Unless, of course, this mount was unmounted by
     someone in
     > between statfs() and reading. If you have this exact case in mind
     (I can't
     > think
     > of anything else) and don't want to retry because of efficiency,
     you can add
     > another statfs() to after reading /proc/mounts and not finding the
     mount --
     > that way you can be sure that the mount is still there but it
     eluded the
     > /proc/mounts.
     Yes, such race was in my mind.
     In other words, it is hard to identify the reason why /proc/mounts
     doesn't show the entry.A  The problem of /proc/mounts, or someone
     else
     unmounted?

   The only way to make sure is to retry reading /proc/mounts. Or, do
   stat+statfs
   (exactly as you suggested below) to check that this is indeed the aufs
   root
   directory (and then you still need to retry reading /proc/mounts).
   A

     Additionally I guess(hope) such parallel mount/unmounts are
     rare.A  And I wonder "2" is the absolute correct solution?A  "3"
     cannot be
     happen? Never?

   If you're asking about number of retries, I chose 3 in my patch just to
   be
   on the safe side. Let's assume that a probability of not finding a
   specific
   mount is 0.01, or 1%, then in case of a second retry it will be
   0.01^2 = 0.0001, or 0.01%. With three retries, it will be 0.01^3, or
   0.0001%.A
   Practically, I never saw it in my test that second retry won't work,
   but maybe
   we can do more testing to figure out what a good number of retries
   should be.
   A

     Statfs(), you say, won't help I am afraid.A  Even if it tells us
     that the
     dir is aufs, it is not the proof of the aufs mountpoint.A  It can be
     a
     subdir of another aufs mount.
     An extra stat(2) call may help in this point.A  It will tell us the
     inode
     number, and if it is AUFS_ROOT_INO, then that path is the aufs
     mountpoint.
     But I wonder do we really have to issue stat(2) and statfs(2) just
     to
     make sure the aufs mount is still there?A  Isn't it rather heavy and
     racy?

   It is racy, not very heavy though, and I suggest to only do it in case
   we
   tried to find the mount in /proc/mounts and failed, before retry.
   Reading
   /proc/mounts is super heavy, compared to a couple of syscalls, looks
   like a decent price to pay to avoid re-reading it.
   A

     > I have also took a deeper look at that other error I mentioned
     earlier.
     > Found out
     > it's a race in au_xino_create(). In case xino mount option is
     provided (we
     > use
     > xino=/dev/shm/aufs.xino in Docker), and multiple mounts (note: for
     different
     > mount points) are performed in parallel, one mount can reach
     >
     > >A  file = vfsub_filp_open(fpath, O_RDWR | O_CREAT | O_EXCL |
     O_LARGEFILE,
     > 0666);
     >
     > line of code, while another mount already created that file, but
     haven't
     > unlinked it yet.
     >
     > As a result, we have an error like these in the kernel log:
     >
     > [2233986.956753] aufs au_xino_create:767:dockerd[17144]: open
     > /dev/shm/aufs.xino(-17)
     > [2233988.732636] aufs au_xino_create:767:dockerd[17518]: open
     > /dev/shm/aufs.xino(-17)
     Thank you very much for the report.
     Here -17 means EEXIST "File exists" error.
     It is an expected behaviour (and I am glad that I know it is working
     expectedly). As you might know, the default path of XINO files are
     the
     top dir of the first writable branch, and a writable branch is not
     sharable between the multiple aufs mounts.A  So by default XINO
     files are
     dedicated to a single aufs mount. Not shared, no confliction
     happens.

   This is correct for read-write mounts, what about read-only ones?A
   The doc says that if there's no writable branch, path used will be
   /tmp/.aufs.xino.
   Sure, aufs kernel module code creates and then removes this file, so
   another
   mount will create another file (with the same name), but I see that it
   is a clear
   possibility to have a race in this case (such as multiple read-only
   mounts happening
   at the same time).
   A

     > Currently, I am working around this unfortunate issue by calling
     mount(2)
     > under
     > an exclusive lock, to make sure no two aufs mounts (again, for
     different
     > mount
     > points) are performed in parallel, but perhaps there is a better
     way?
     >
     > I am going to mitigate this race by adding a random suffix to xino
     file
     > name; do you think
     > it is a decent workaround?
     If your first writable branch is somewhere on /dev/shm, then you can
     remove "xino=" option.A  In this case, the XINO files will be
     created
     under /dev/shm and not shared.A  Moreover "xino=" option is
     something
     like a last resort generally.A  As long as the filesystem of your
     first
     writable branch doesn't support XINO, or you want a little gain
     around
     the aufs internal XINO handling, you may want "xino=".A  Otherwise
     you
     can omit it.

   My guess is, whoever wrote docker aufs support, thought that using
   tmpfs
   for xino file is faster. Or maybe they saw an advise (from some old
   documentation,
   such asA [2]http://aufs.sourceforge.net/aufs2/brsync/README.txt) to
   never
   put xino file to an SSD (I'm sure many docker users use ssd hard
   drivers).
   A

     Of course adding a random/unique suffix is a good idea.A  If I were
     you,
     I'd use $$ in shell script manner such like
     A  A  A  A  mount -t aufs -obr=...,xino=/dev/shm/aufs.xino.$$ ...

   My code is in Go, so I'm patching it this way
   - A  A  A  opts := "dio,xino=/dev/shm/aufs.xino"
   + A  A  A  rnd := strconv.FormatInt(int64(1e9+rand.Uint32()%1e9),
   36)[1:5]
   + A  A  A  opts := "dio,xino=/dev/shm/aufs."+rnd
   (rnd is some 4 random characters in 0-9a-z range)

References

   1. mailto:sf...@users.sourceforge.net
   2. http://aufs.sourceforge.net/aufs2/brsync/README.txt

Re: [PATCH] au_proc_getmntent: optimize reading, retry

Reply via email to