But I am compiling Open MPI with --without-memory-manager, so it should work?
Anyways, I ran the tests and valgrind is reporting 2 different (potentially related) problems: 1. ==12680== Invalid read of size 4 ==12680== at 0x709DE03: ompi_cb_fifo_write_to_head (ompi_circular_buffer_fifo.h:271) ==12680== by 0x709DA77: ompi_fifo_write_to_head (ompi_fifo.h:324) ==12680== by 0x709D964: mca_btl_sm_component_progress (btl_sm_component.c:398) ==12680== by 0x705BF6B: mca_bml_r2_progress (bml_r2.c:110) ==12680== by 0x44F905B: opal_progress (opal_progress.c:187) ==12680== by 0x704F0E5: opal_condition_wait (condition.h:98) ==12680== by 0x704EFD4: mca_pml_ob1_recv (pml_ob1_irecv.c:124) ==12680== by 0x7202A62: ompi_coll_tuned_scatter_intra_binomial (coll_tuned_scatter.c:166) ==12680== by 0x71F2C08: ompi_coll_tuned_scatter_intra_dec_fixed (coll_tuned_decision_fixed.c:746) ==12680== by 0x4442494: PMPI_Scatter (pscatter.c:125) ==12680== by 0x8048F6F: main (scatter_in_place.c:73) 2. ==28775== Jump to the invalid address stated on the next line ==28775== at 0x2F305F35: ??? ==28775== by 0x704AF6B: mca_bml_r2_progress (bml_r2.c:110) ==28775== by 0x44F905B: opal_progress (opal_progress.c:187) ==28775== by 0x440BF6B: opal_condition_wait (condition.h:98) ==28775== by 0x440BDF7: ompi_request_wait (req_wait.c:46) ==28775== by 0x71EF396: ompi_coll_tuned_reduce_scatter_intra_basic_recursivehalving (coll_tuned_reduce_scatter.c:319) ==28775== by 0x71E1540: ompi_coll_tuned_reduce_scatter_intra_dec_fixed (coll_tuned_decision_fixed.c:471) ==28775== by 0x7202806: ompi_osc_pt2pt_module_fence (osc_pt2pt_sync.c:84) ==28775== by 0x44501B5: PMPI_Win_fence (pwin_fence.c:57) ==28775== by 0x80493D6: test_acc3_1 (test_acc3.c:156) ==28775== by 0x8048FD0: test_acc3 (test_acc3.c:26) ==28775== by 0x8049609: main (test_acc3.c:206) ==28775== Address 0x2F305F35 is not stack'd, malloc'd or (recently) free'd I don't know what to make of these. Here is the link to the full results: http://www.open-mpi.org/mtt/index.php?do_redir=386 Thanks, Tim On Friday 21 September 2007 10:40:21 am George Bosilca wrote: > Tim, > > Valgrind will not help ... It can help with double free or things > like this, but not with over-running memory that belong to your > application. However, in Open MPI we have something that might help > you. The option --enable-mem-debug add a unused space at the end of > each memory allocation and make sure we don't write anything there. I > think this is the simplest way to pinpoint this problem. > > Thanks, > george. > > On Sep 21, 2007, at 10:07 AM, Tim Prins wrote: > > Aurelien and Brian. > > > > Thanks for the suggestions. I reran the runs with --without-memory- > > manager and > > got (on 2 of 5000 runs): > > *** glibc detected *** corrupted double-linked list: 0xf704dff8 *** > > on one and > > *** glibc detected *** malloc(): memory corruption: 0xeda00c70 *** > > on the other. > > > > So it looks like somewhere we are over-running our allocated space. > > So now I > > am attempting to redo the run with valgrind. > > > > Tim > > > > On Thursday 20 September 2007 09:59:14 pm Brian Barrett wrote: > >> On Sep 20, 2007, at 7:02 AM, Tim Prins wrote: > >>> In our nightly runs with the trunk I have started seeing cases > >>> where we > >>> appear to be segfaulting within/below malloc. Below is a typical > >>> output. > >>> > >>> Note that this appears to only happen on the trunk, when we use > >>> openib, > >>> and are in 32 bit mode. It seems to happen randomly at a very low > >>> frequency (59 out of about 60,000 32 bit openib runs). > >>> > >>> This could be a problem with our machine, and has showed up since I > >>> started testing 32bit ofed 10 days ago. > >>> > >>> Anyways, just curious if anyone had any ideas. > >> > >> As someone else said, this usually points to a duplicate free or the > >> like in malloc. You might want to try compiling with --without- > >> memory-manager, as the ptmalloc2 in glibc frequently is more verbose > >> about where errors occurred than is the one in Open MPI. > >> > >> Brian > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel