Ok, I'll try this with the latest trunk.
thanks,
--td
George Bosilca wrote:
Apparently it was with 19845, so before the patch that is supposed to
fix this issue. Terry can you please test with a more recent version
(> 19929).
Thanks,
george.
On Nov 8, 2008, at 9:54 AM, Edgar Gabriel wrote:
Terry,
was this with the trunk or v1.3? If it was the trunk, was it before
r19929 was applied? The reason I ask is because r19929 should remove
all error messages related to 'non-existing communictors'. Hierarch
btw. is not the cause for the error messages even before that, it
just exposes it more frequently...
Thanks
Edgar
Terry Dontje wrote:
I am seeing the message "Dropped message for the non-existing
communicator" when running hpcc with np=124 against r19845. This
seems to be pretty reproducible at np=124. When the job prints out
the message above some set of processes are in an MPI_Bcast and the
15 processes reporting the message are stuck in MPI_Barrier.
I am not sure how related this is to #1408 since I am not invoking
the hierarchical collectives. I just wanted to see if anyone else
has tried to run hpcc at such an np size with any success.
My next steps are to try to run this with the latest trunk and to
narrow down the failing case.
--td
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab http://pstl.cs.uh.edu
Department of Computer Science University of Houston
Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
------------------------------------------------------------------------
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel