Ralph,
This patch fixed it, num_nodes was being used initialised and hence the
client was getting a bogus value for the number of nodes.
Ashley,
On Mon, 2009-05-18 at 10:09 +0100, Ashley Pittman wrote:
> No joy I'm afraid, now I get errors when I run it. This is a single
> node job run with the command line "mpirun -n 3 ./a.out". I've attached
> the strace output and gzipped /tmp files from the machine. Valgrind on
> the opmi-ps process doesn't show anything interesting.
>
> [alpha:29942] [[35044,0],0] ORTE_ERROR_LOG: Data unpack would read past
> end of buffer in
> file
> /mnt/home/debian/ashley/code/OpenMPI/ompi-trunk-tes/trunk/orte/util/comm/comm.c
> at line 242
> [alpha:29942] [[35044,0],0] ORTE_ERROR_LOG: Data unpack would read past
> end of buffer in
> file
> /mnt/home/debian/ashley/code/OpenMPI/ompi-trunk-tes/trunk/orte/tools/orte-ps/orte-ps.c
> at line 818
>
> Ashley.
>
> On Sat, 2009-05-16 at 08:15 -0600, Ralph Castain wrote:
> > This is fixed now, Ashley - sorry for the problem.
> >
> >
> > On May 15, 2009, at 4:47 AM, Ashley Pittman wrote:
> >
> > > On Thu, 2009-05-14 at 22:49 -0600, Ralph Castain wrote:
> > >> It is definitely broken at the moment, Ashley. I have it pretty well
> > >> fixed, but need/want to cleanup some corner cases that have plagued
> > >> us
> > >> for a long time.
> > >>
> > >> Should have it for you sometime Friday.
> > >
> > > Ok, thanks. I might try switching to slurm in the mean-time, I know
> > > my
> > > code works with that.
> > >
> > > Can you let me know when it's fixed on or off list and I'll do an
> > > update.
> > >
> > > Ashley,
> > >
> > > _______________________________________________
> > > devel mailing list
> > > [email protected]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > [email protected]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
Index: orte/orted/orted_comm.c
===================================================================
--- orte/orted/orted_comm.c (revision 21248)
+++ orte/orted/orted_comm.c (working copy)
@@ -837,6 +837,7 @@
goto CLEANUP;
}
} else {
+ num_nodes = 0;
/* count number of nodes */
for (i=0; i < orte_node_pool->size; i++) {
if (NULL != opal_pointer_array_get_item(orte_node_pool, i)) {