[OMPI devel] 1.5rc5: new opal_path_nfs test failures

Paul H. Hargrove Fri, 27 Aug 2010 00:19:55 -0400

I have found a system that is triggering two (new as far as I can tell)failure modes in opal_path_nfs().

This is a Linux/PPC64 host, but NOT the BG/P front-end I've beenreporting other issues with.

This is also with gcc, not XLC.  So, this is a "normal" Linux/PPC system.

I'll provide platform details on request, but I don't think they arerelevant to the problems.

The first is the shear size of the NFS-mounted filesystems (79TB), whichis causing the statfs() call in opal_path_nfs() to fail witherrno=EOVERFLOW. In this case the f_type field appears to still bevalid (printed it out to confirm), but opal_path_nfs() has given up, andincorrectly returns 0 to indicate the path is NOT on a clusterfilesystem. This particular errno value seems like a simple matter tocode an exception for.

The second is that the system is using an automounter to mount homedirectories over NFS. This results in those mount points getting anf_type of AUTOFS_SUPER_MAGIC instead of NFS_SUPER_MAGIC. If one iswilling to assume that all automounted filesystem are clusterfilesystems, then the solution in this case should also be relativelysimple. I suppose the acceptability of this approach depends on whetherone wants opal_path_nfs() to error toward FALSE or toward TRUE. If thebehavior on failure of statfs() is to be taken as the precedent, thenone probably should NOT assume that AUTOFS_SUPER_MAGIC indicates acluster filesystem, and I have no suggestion (other than parsing/proc/mounts, given that statfs() is already Linux specific) as to howone should determine the "real" filesystem type.


-Paul

--
Paul H. Hargrove                          [email protected]
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

[OMPI devel] 1.5rc5: new opal_path_nfs test failures

Reply via email to