Granted - cmr'd to 1.7.5 with you set to review

On Jan 23, 2014, at 7:35 AM, Nathan Hjelm <hje...@lanl.gov> wrote:

> I agree. A configure option to disable the use of getpwuid would be
> great as it is one of those functions that can never be static. getpwuid
> also fails for no particular reason on at least one XC30.
> 
> -Nathan
> 
> On Wed, Jan 22, 2014 at 08:57:20PM -0800, Ralph Castain wrote:
>>   Interesting - still, I see no reason for OMPI to fail just because of
>>   that. We can run just fine with the uid, so I'll make things a little more
>>   flexible.
>>   Thanks for tracking it down!
>>   On Jan 22, 2014, at 7:54 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>> 
>>     Not lacking getpwuid():
>>     [phh1@biou2 BLD]$ grep HAVE_GETPWUID */include/*_config.h
>>     opal/include/opal_config.h:#define HAVE_GETPWUID 1
>>     I also can't see why the quoted code could fail.
>>     The following is working fine:
>>     [phh1@biou2 BLD]$ cat q.c
>>     #include <stdio.h>
>>     #include <unistd.h>
>>     #include <sys/types.h>
>>     #include <pwd.h>
>>     int main(void) {
>>        uid_t uid = getuid();
>>        printf("uid = %d\n", (int)uid);
>>        struct passwd *p = getpwuid(uid); 
>>        if (p) printf("name = %s\n", p->pw_name);
>>        return 0;
>>     }
>>     [phh1@biou2 BLD]$ gcc -std=c99 q.c && ./a.out
>>     uid = 44154
>>     name = phh1
>>     HOWEVER, building for ILP32 target (as in the reported failure) fails:
>>     [phh1@biou2 BLD]$ gcc -m32 -std=c99 q.c && ./a.out
>>     uid = 44154
>>     So, I am going to guess that this *is* a system misconfiguration (maybe
>>     missing the 32-bit foo.so for the appropriate nsswitch resolver?) just
>>     as the error message said.
>>     Sorry for the false alarm,
>>     -Paul
>> 
>>     On Wed, Jan 22, 2014 at 7:36 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>       Here is the offending code:
>>            /* get the name of the user */
>>           uid = getuid();
>>       #ifdef HAVE_GETPWUID
>>           pwdent = getpwuid(uid);
>>       #else
>>           pwdent = NULL;
>>       #endif
>>           if (NULL != pwdent) {
>>               user = strdup(pwdent->pw_name);
>>           } else {
>>               orte_show_help("help-orte-runtime.txt",
>>                              "orte:session:dir:nopwname", true);
>>               return ORTE_ERR_OUT_OF_RESOURCE;
>>           }
>>       Is it possible on this platform that you don't have getpwuid? I'm
>>       surprised at the code as we could just use the uid instead - not sure
>>       why this more stringent test was applied
>>       On Jan 22, 2014, at 7:02 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>> 
>>         On yet another test platform I see the following:
>>         $ mpirun -mca btl sm,self -np 1 examples/ring_c
>>         
>> --------------------------------------------------------------------------
>>         Open MPI was unable to obtain the username in order to create a path
>>         for its required temporary directories.  This type of error is
>>         usually
>>         caused by a transient failure of network-based authentication
>>         services
>>         (e.g., LDAP or NIS failure due to network congestion), but can also
>>         be
>>         an indication of system misconfiguration.
>>         Please consult your system administrator about these issues and try
>>         again.
>>         
>> --------------------------------------------------------------------------
>>         [biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource
>>         in file
>>         
>> /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/util/session_dir.c
>>         at line 380
>>         [biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource
>>         in file
>>         
>> /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/mca/ess/hnp/ess_hnp_module.c
>>         at line 599
>>         
>> --------------------------------------------------------------------------
>>         It looks like orte_init failed for some reason; your parallel
>>         process is
>>         likely to abort.  There are many reasons that a parallel process can
>>         fail during orte_init; some of which are due to configuration or
>>         environment problems.  This failure appears to be an internal
>>         failure;
>>         here's some additional information (which may only be relevant to an
>>         Open MPI developer):
>>           orte_session_dir failed
>>           --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
>>         
>> --------------------------------------------------------------------------
>>         An "-np 2" run fails in the same manner.
>>         This is a production system and there is no problem with "whoami" or
>>         "id", leaving me doubting the explanation provided by the error
>>         message.
>>         [phh1@biou2 ~]$ whoami
>>         phh1
>>         [phh1@biou2 ~]$ id
>>         uid=44154(phh1) gid=2016(hpc)
>>         groups=2016(hpc),3803(hpcusers),3805(sshgw),3808(biou)
>>         The "ompi_info --all" output is attached.
>>         Please let me know what additional info is needed.
>>         -Paul
>>         --
>>         Paul H. Hargrove                          phhargr...@lbl.gov
>>         Future Technologies Group
>>         Computer and Data Sciences Department     Tel: +1-510-495-2352
>>         Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>         <biou2_info.txt.bz2>_______________________________________________
>>         devel mailing list
>>         de...@open-mpi.org
>>         http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>>       _______________________________________________
>>       devel mailing list
>>       de...@open-mpi.org
>>       http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>>     --
>>     Paul H. Hargrove                          phhargr...@lbl.gov
>>     Future Technologies Group
>>     Computer and Data Sciences Department     Tel: +1-510-495-2352
>>     Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>     _______________________________________________
>>     devel mailing list
>>     de...@open-mpi.org
>>     http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to