Interesting - still, I see no reason for OMPI to fail just because of that. We 
can run just fine with the uid, so I'll make things a little more flexible.

Thanks for tracking it down!

On Jan 22, 2014, at 7:54 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> Not lacking getpwuid():
> 
> [phh1@biou2 BLD]$ grep HAVE_GETPWUID */include/*_config.h
> opal/include/opal_config.h:#define HAVE_GETPWUID 1
> 
> I also can't see why the quoted code could fail.
> The following is working fine:
> 
> [phh1@biou2 BLD]$ cat q.c
> #include <stdio.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <pwd.h>
> int main(void) {
>    uid_t uid = getuid();
>    printf("uid = %d\n", (int)uid);
>    struct passwd *p = getpwuid(uid); 
>    if (p) printf("name = %s\n", p->pw_name);
>    return 0;
> }
> 
> [phh1@biou2 BLD]$ gcc -std=c99 q.c && ./a.out
> uid = 44154
> name = phh1
> 
> HOWEVER, building for ILP32 target (as in the reported failure) fails:
> 
> [phh1@biou2 BLD]$ gcc -m32 -std=c99 q.c && ./a.out
> uid = 44154
> 
> So, I am going to guess that this *is* a system misconfiguration (maybe 
> missing the 32-bit foo.so for the appropriate nsswitch resolver?) just as the 
> error message said.
> 
> Sorry for the false alarm,
> -Paul
> 
> 
> On Wed, Jan 22, 2014 at 7:36 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Here is the offending code:
> 
>      /* get the name of the user */
>     uid = getuid();
> #ifdef HAVE_GETPWUID
>     pwdent = getpwuid(uid);
> #else
>     pwdent = NULL;
> #endif
>     if (NULL != pwdent) {
>         user = strdup(pwdent->pw_name);
>     } else {
>         orte_show_help("help-orte-runtime.txt",
>                        "orte:session:dir:nopwname", true);
>         return ORTE_ERR_OUT_OF_RESOURCE;
>     }
> 
> Is it possible on this platform that you don't have getpwuid? I'm surprised 
> at the code as we could just use the uid instead - not sure why this more 
> stringent test was applied
> 
> 
> 
> On Jan 22, 2014, at 7:02 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
>> On yet another test platform I see the following:
>> 
>> $ mpirun -mca btl sm,self -np 1 examples/ring_c
>> --------------------------------------------------------------------------
>> Open MPI was unable to obtain the username in order to create a path
>> for its required temporary directories.  This type of error is usually
>> caused by a transient failure of network-based authentication services
>> (e.g., LDAP or NIS failure due to network congestion), but can also be
>> an indication of system misconfiguration.
>> 
>> Please consult your system administrator about these issues and try
>> again.
>> --------------------------------------------------------------------------
>> [biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource in file 
>> /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/util/session_dir.c
>>  at line 380
>> [biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource in file 
>> /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/mca/ess/hnp/ess_hnp_module.c
>>  at line 599
>> --------------------------------------------------------------------------
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>> 
>>   orte_session_dir failed
>>   --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
>> --------------------------------------------------------------------------
>> 
>> 
>> An "-np 2" run fails in the same manner.
>> This is a production system and there is no problem with "whoami" or "id", 
>> leaving me doubting the explanation provided by the error message.
>> 
>> [phh1@biou2 ~]$ whoami
>> phh1
>> [phh1@biou2 ~]$ id
>> uid=44154(phh1) gid=2016(hpc) 
>> groups=2016(hpc),3803(hpcusers),3805(sshgw),3808(biou)
>> 
>> The "ompi_info --all" output is attached.
>> Please let me know what additional info is needed.
>> 
>> -Paul
>> 
>> -- 
>> Paul H. Hargrove                          phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department     Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>> <biou2_info.txt.bz2>_______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to