Here is the offending code:

     /* get the name of the user */
    uid = getuid();
#ifdef HAVE_GETPWUID
    pwdent = getpwuid(uid);
#else
    pwdent = NULL;
#endif
    if (NULL != pwdent) {
        user = strdup(pwdent->pw_name);
    } else {
        orte_show_help("help-orte-runtime.txt",
                       "orte:session:dir:nopwname", true);
        return ORTE_ERR_OUT_OF_RESOURCE;
    }

Is it possible on this platform that you don't have getpwuid? I'm surprised at 
the code as we could just use the uid instead - not sure why this more 
stringent test was applied



On Jan 22, 2014, at 7:02 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> On yet another test platform I see the following:
> 
> $ mpirun -mca btl sm,self -np 1 examples/ring_c
> --------------------------------------------------------------------------
> Open MPI was unable to obtain the username in order to create a path
> for its required temporary directories.  This type of error is usually
> caused by a transient failure of network-based authentication services
> (e.g., LDAP or NIS failure due to network congestion), but can also be
> an indication of system misconfiguration.
> 
> Please consult your system administrator about these issues and try
> again.
> --------------------------------------------------------------------------
> [biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource in file 
> /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/util/session_dir.c
>  at line 380
> [biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource in file 
> /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/mca/ess/hnp/ess_hnp_module.c
>  at line 599
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>   orte_session_dir failed
>   --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> 
> 
> An "-np 2" run fails in the same manner.
> This is a production system and there is no problem with "whoami" or "id", 
> leaving me doubting the explanation provided by the error message.
> 
> [phh1@biou2 ~]$ whoami
> phh1
> [phh1@biou2 ~]$ id
> uid=44154(phh1) gid=2016(hpc) 
> groups=2016(hpc),3803(hpcusers),3805(sshgw),3808(biou)
> 
> The "ompi_info --all" output is attached.
> Please let me know what additional info is needed.
> 
> -Paul
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> <biou2_info.txt.bz2>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to