Hello,

On some SMP processing nodes we have in our cluster we are noticing the
following odd behavior.  It seems like there might be a race condition
somewhere in automount that results in the same (in this case NFS)
device mounted twice on the same mountpoint.

In our case we have a (closed-source, vendor provided) data processing
app that runs 2-4 processes at a time on each of these nodes.  The
processes communicate via MPI.  What ends up happening is that each of
them tries to read data from these NFS-mounted volumes at exactly the
same time, and sometimes (about one node out of every 10) we get unlucky
and the disk gets double-mounted.

Here is the entry from the messages file where the disks are getting
mounted:
Nov  2 16:52:53 fir32 automount[674]: attempting to mount entry
/etvf/data0
Nov  2 16:52:53 fir32 automount[674]: attempting to mount entry
/etvf/data0

(Yes, there are two of them.)

/proc/mounts looks as follows:

rootfs / rootfs rw 0 0
/dev/root / ext3 rw 0 0
/proc /proc proc rw 0 0
usbdevfs /proc/bus/usb usbdevfs rw 0 0
/dev/hda1 /boot ext3 rw 0 0
none /dev/pts devpts rw 0 0
none /dev/shm tmpfs rw 0 0
automount(pid626) /etvp autofs rw 0 0
automount(pid674) /etvf autofs rw 0 0
automount(pid695) /nova autofs rw 0 0
automount(pid589) /home autofs rw 0 0
automount(pid601) /etve autofs rw 0 0
automount(pid649) /etvo autofs rw 0 0
odin:/export/users /home/users nfs
rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=odin 0 0
pecan:/etvp/data8 /etvp/data8 nfs
rw,v3,rsize=32768,wsize=32768,hard,intr,tcp,lock,addr=pecan 0 0
fenris:/etvf/data0 /etvf/data0 nfs
rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=fenris 0 0
fenris:/etvf/data0 /etvf/data0 nfs
rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=fenris 0 0
odin:/export/prog /home/prog nfs
rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=odin 0 0

The mount in question is "fenris:/etvf/data0".  (We have an automount
process running for each of our big disk servers.  Each has a different,
NIS provided map of disks to serve.)

Something odd, possibly related: when you use 'df', you get a strange
message:
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/hda3             74754492   1524216  69432912   3% /
/dev/hda1               101089      6976     88894   8% /boot
none                   2069232         0   2069232   0% /dev/shm
df: `/tmp/autofs-bind-3fa390d2-259/dir2': No such file or directory
odin:/export/users    44038844  39681000   4357844  91% /home/users
pecan:/etvp/data8    872779558 682596377 181455386  79% /etvp/data8
fenris:/etvf/data0   1662282384 1409676396 168166904  90% /etvf/data0
fenris:/etvf/data0   1662282384 1409676396 168166904  90% /etvf/data0
odin:/export/prog     31456316  18282432  13173884  59% /home/prog

This is automount 3.1.7 as provided in Red Hat 8.0.  We are running a
2.4.20 kernel patched with Trond Myklebust's NFS client patches and
support for Broadcom's gigabit ethernet cards.

Any help or suggestions appreciated.  If the problem is fixed in autofs4
client tools I'll be happy to try them and report back.  Since this is a
cluster, though, I'm reluctant to commit to upgrading all of the
machines without some idea if it'll make a difference.

Oh -- the reason that we care!  Based on anecdotal evidence, nodes that
do this double-mount run their processing jobs much slower than those
that don't.  I suspect the reason for that is some negative effect on
caching due to the duplicated mount.  In any event, though, it does seem
like a bug.

---
Matthew Mitchell
Systems Programmer/Administrator
Geophysical Development Corporation

_______________________________________________
autofs mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/autofs

Reply via email to