We didn't check the boundaries of the shell string from the structure
returned by getpwuid. This should be fixed by commit r20853.
Thanks for your help,
george.
On Mar 24, 2009, at 01:37 , Sergey E. Koposov wrote:
Hi All,
I've found that openmpi-1.3.1 segfaults when the the shell field in
the passwd file is empty.
So I take the simple program which does nothing:
--------------------------------------
#include <stdio.h>
#include "mpi.h"
main (int argc, char **argv) {
int nworkers, whoami, i, errcode;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &whoami);
MPI_Comm_size(MPI_COMM_WORLD, &nworkers);
printf("%d %d ",whoami, nworkers);
MPI_Finalize();
}
----------------------------------
Compile it. And run it.
I get the segfault:
----------------------------------
[fortune:05346] *** Process received signal ***
[fortune:05346] Signal: Segmentation fault (11)
[fortune:05346] Signal code: Address not mapped (1)
[fortune:05346] Failing at address: 0x1
[fortune:05346] [ 0] [0xffffe40c]
[fortune:05346] [ 1] /usr/lib/openmpi/mca_plm_rsh.so [0xb7f9baa1]
[fortune:05346] [ 2] /usr/lib/openmpi/mca_plm_rsh.so [0xb7f9d291]
[fortune:05346] [ 3] /usr/bin/mpirun [0x804a8cb]
[fortune:05346] [ 4] /usr/bin/mpirun [0x8049ff2]
[fortune:05346] [ 5] /lib/libc.so.6(__libc_start_main+0xe0)
[0xb7d56390]
[fortune:05346] [ 6] /usr/bin/mpirun [0x8049f71]
[fortune:05346] *** End of error message ***
--------------------------
Here is the gdb backtrace:
------------------------------
0xb7dc08c1 in strcmp () from /lib/libc.so.6
(gdb) bt
#0 0xb7dc08c1 in strcmp () from /lib/libc.so.6
#1 0xb7f0ecc9 in find_shell (shell=0x8074b95 "") at
plm_rsh_module.c:1459
#2 0xb7f0ce8b in setup_launch (argcptr=0xbfce5960,
argvptr=0xbfce5968,
nodename=0x80795c0 "fortune", node_name_index1=0xbfce5970,
proc_vpid_index=0xbfce596c, prefix_dir=0x805b028 "/tmp/
openmpi_inst")
at plm_rsh_module.c:376
#3 0xb7f0e181 in orte_plm_rsh_launch (jdata=0x80539a8)
at plm_rsh_module.c:1051
#4 0x0804a8eb in orterun (argc=4, argv=0xbfce5b74) at orterun.c:680
#5 0x0804a012 in main (argc=Cannot access memory at address 0x1
) at main.c:13
(gdb)
---------------------------
It is clear that the segfault comes from the fact that the shell
field in getpwuid(getuid()) is empty. (as it is in /etc/passwd
too). As far as I understand the empty shell field in passwd file is
perfectly correct and is an alias for /bin/sh (see man 5 passwd).
So, I guess in that case the setup_launch() function should just have
an additional check for an empty pw_shell. Something like this:
-----------------------------------------------
--- openmpi-1.3.1/orte/mca/plm/rsh/plm_rsh_module.c.orig
2009-03-24 06:22:06.000000000 +0100
+++ openmpi-1.3.1/orte/mca/plm/rsh/plm_rsh_module.c 2009-03-24
06:24:07.000000000 +0100
@@ -372,8 +372,11 @@
orte_show_help( "help-plm-rsh.txt", "unknown-user", true,
(int)getuid() );
return ORTE_ERR_FATAL;
} else {
- param = p->pw_shell;
- local_shell = find_shell(p->pw_shell);
+ if (!((p->pw_shell)[0]))
+ param="/bin/sh";
+ else
+ param = p->pw_shell;
+ local_shell = find_shell(param);
}
/* If we didn't find it in getpwuid(), try looking at the $SHELL
environment variable (see https://svn.open-mpi.org/trac/ompi/ticket/1060)
----------------------
Regards,
Sergey
*******************************************************************
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/
Sternberg Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: m...@sai.msu.ru
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel