I have found a system that is triggering two (new as far as I can tell)
failure modes in opal_path_nfs().
This is a Linux/PPC64 host, but NOT the BG/P front-end I've been
reporting other issues with.
This is also with gcc, not XLC. So, this is a "normal" Linux/PPC system.
I'll provide platform details on request, but I don't think they are
relevant to the problems.
The first is the shear size of the NFS-mounted filesystems (79TB), which
is causing the statfs() call in opal_path_nfs() to fail with
errno=EOVERFLOW. In this case the f_type field appears to still be
valid (printed it out to confirm), but opal_path_nfs() has given up, and
incorrectly returns 0 to indicate the path is NOT on a cluster
filesystem. This particular errno value seems like a simple matter to
code an exception for.
The second is that the system is using an automounter to mount home
directories over NFS. This results in those mount points getting an
f_type of AUTOFS_SUPER_MAGIC instead of NFS_SUPER_MAGIC. If one is
willing to assume that all automounted filesystem are cluster
filesystems, then the solution in this case should also be relatively
simple. I suppose the acceptability of this approach depends on whether
one wants opal_path_nfs() to error toward FALSE or toward TRUE. If the
behavior on failure of statfs() is to be taken as the precedent, then
one probably should NOT assume that AUTOFS_SUPER_MAGIC indicates a
cluster filesystem, and I have no suggestion (other than parsing
/proc/mounts, given that statfs() is already Linux specific) as to how
one should determine the "real" filesystem type.
-Paul
--
Paul H. Hargrove phhargr...@lbl.gov
Future Technologies Group
HPC Research Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900