Granted - cmr'd to 1.7.5 with you set to review
On Jan 23, 2014, at 7:35 AM, Nathan Hjelm <hje...@lanl.gov> wrote: > I agree. A configure option to disable the use of getpwuid would be > great as it is one of those functions that can never be static. getpwuid > also fails for no particular reason on at least one XC30. > > -Nathan > > On Wed, Jan 22, 2014 at 08:57:20PM -0800, Ralph Castain wrote: >> Interesting - still, I see no reason for OMPI to fail just because of >> that. We can run just fine with the uid, so I'll make things a little more >> flexible. >> Thanks for tracking it down! >> On Jan 22, 2014, at 7:54 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: >> >> Not lacking getpwuid(): >> [phh1@biou2 BLD]$ grep HAVE_GETPWUID */include/*_config.h >> opal/include/opal_config.h:#define HAVE_GETPWUID 1 >> I also can't see why the quoted code could fail. >> The following is working fine: >> [phh1@biou2 BLD]$ cat q.c >> #include <stdio.h> >> #include <unistd.h> >> #include <sys/types.h> >> #include <pwd.h> >> int main(void) { >> uid_t uid = getuid(); >> printf("uid = %d\n", (int)uid); >> struct passwd *p = getpwuid(uid); >> if (p) printf("name = %s\n", p->pw_name); >> return 0; >> } >> [phh1@biou2 BLD]$ gcc -std=c99 q.c && ./a.out >> uid = 44154 >> name = phh1 >> HOWEVER, building for ILP32 target (as in the reported failure) fails: >> [phh1@biou2 BLD]$ gcc -m32 -std=c99 q.c && ./a.out >> uid = 44154 >> So, I am going to guess that this *is* a system misconfiguration (maybe >> missing the 32-bit foo.so for the appropriate nsswitch resolver?) just >> as the error message said. >> Sorry for the false alarm, >> -Paul >> >> On Wed, Jan 22, 2014 at 7:36 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >> Here is the offending code: >> /* get the name of the user */ >> uid = getuid(); >> #ifdef HAVE_GETPWUID >> pwdent = getpwuid(uid); >> #else >> pwdent = NULL; >> #endif >> if (NULL != pwdent) { >> user = strdup(pwdent->pw_name); >> } else { >> orte_show_help("help-orte-runtime.txt", >> "orte:session:dir:nopwname", true); >> return ORTE_ERR_OUT_OF_RESOURCE; >> } >> Is it possible on this platform that you don't have getpwuid? I'm >> surprised at the code as we could just use the uid instead - not sure >> why this more stringent test was applied >> On Jan 22, 2014, at 7:02 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: >> >> On yet another test platform I see the following: >> $ mpirun -mca btl sm,self -np 1 examples/ring_c >> >> -------------------------------------------------------------------------- >> Open MPI was unable to obtain the username in order to create a path >> for its required temporary directories. This type of error is >> usually >> caused by a transient failure of network-based authentication >> services >> (e.g., LDAP or NIS failure due to network congestion), but can also >> be >> an indication of system misconfiguration. >> Please consult your system administrator about these issues and try >> again. >> >> -------------------------------------------------------------------------- >> [biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource >> in file >> >> /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/util/session_dir.c >> at line 380 >> [biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource >> in file >> >> /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/mca/ess/hnp/ess_hnp_module.c >> at line 599 >> >> -------------------------------------------------------------------------- >> It looks like orte_init failed for some reason; your parallel >> process is >> likely to abort. There are many reasons that a parallel process can >> fail during orte_init; some of which are due to configuration or >> environment problems. This failure appears to be an internal >> failure; >> here's some additional information (which may only be relevant to an >> Open MPI developer): >> orte_session_dir failed >> --> Returned value Out of resource (-2) instead of ORTE_SUCCESS >> >> -------------------------------------------------------------------------- >> An "-np 2" run fails in the same manner. >> This is a production system and there is no problem with "whoami" or >> "id", leaving me doubting the explanation provided by the error >> message. >> [phh1@biou2 ~]$ whoami >> phh1 >> [phh1@biou2 ~]$ id >> uid=44154(phh1) gid=2016(hpc) >> groups=2016(hpc),3803(hpcusers),3805(sshgw),3808(biou) >> The "ompi_info --all" output is attached. >> Please let me know what additional info is needed. >> -Paul >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> Computer and Data Sciences Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> <biou2_info.txt.bz2>_______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> Computer and Data Sciences Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel