I have found a system that is triggering two (new as far as I can tell) failure modes in opal_path_nfs().

This is a Linux/PPC64 host, but NOT the BG/P front-end I've been reporting other issues with.
This is also with gcc, not XLC.  So, this is a "normal" Linux/PPC system.
I'll provide platform details on request, but I don't think they are relevant to the problems.

The first is the shear size of the NFS-mounted filesystems (79TB), which is causing the statfs() call in opal_path_nfs() to fail with errno=EOVERFLOW. In this case the f_type field appears to still be valid (printed it out to confirm), but opal_path_nfs() has given up, and incorrectly returns 0 to indicate the path is NOT on a cluster filesystem. This particular errno value seems like a simple matter to code an exception for.

The second is that the system is using an automounter to mount home directories over NFS. This results in those mount points getting an f_type of AUTOFS_SUPER_MAGIC instead of NFS_SUPER_MAGIC. If one is willing to assume that all automounted filesystem are cluster filesystems, then the solution in this case should also be relatively simple. I suppose the acceptability of this approach depends on whether one wants opal_path_nfs() to error toward FALSE or toward TRUE. If the behavior on failure of statfs() is to be taken as the precedent, then one probably should NOT assume that AUTOFS_SUPER_MAGIC indicates a cluster filesystem, and I have no suggestion (other than parsing /proc/mounts, given that statfs() is already Linux specific) as to how one should determine the "real" filesystem type.

-Paul

--
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to