Jeff, I ran IMB on 60 procs with the openib and self btls, and all ran fine. The tests that were run were ping-pong, ping-ping, SendRecv, Exchange, Allreduce, Reduce, Reduce_scatter, Allgather, Allgatherv, Alltoall, Bcast, and Barrier. I also ran on 40 procs, and several smaller runs. If you can reproduce and provide more details (I realize you ran out of time), I can take another look. I would expect a bug in the changes would cause one to walk over memory, rather than change the memory usage, but who knows. I will be off line until late Sunday...
Rich On 11/2/07 3:26 PM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: > Rich - > > I'm not 100% sure its fixed - I'm still seeing "out of memory" errors when > running about 40 prob imb over openib. But I ran out of time to investigate > deeply... > > Could you try running a nontrivial omb to check? > > -jms > Sent from my PDA > > -----Original Message----- > From: Richard Graham [mailto:rlgra...@ornl.gov] > Sent: Friday, November 02, 2007 02:07 PM Eastern Standard Time > To: Open MPI Developers > Subject: Re: [OMPI devel] openib currently broken > > R16641 should have fixed the regression. Anyone using ompi_free_list_t_ex() > and providing > a memory allocator would have been bitten by this, since I did not update > this function > (which will be deprecated in favor of a version parallel to > ompi_free_list_t_new) to initialize > the new fields defined. From looking through the btls, this seems to be > only the openib btl. > > Rich > > > On 11/2/07 12:31 PM, "Richard Graham" <rlgra...@ornl.gov> wrote: > >> > >> > >> > >> > On 11/2/07 12:21 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote: >> > >>> >> The freelist changes from yesterday appear to have broken the openib >>> >> btl. We didn't get lots of test failures in MTT last night only >>> >> because there was a separate (unrelated) typo in the ofud BTL that >>> >> prevented the nightly tarball from building on any IB-capable >>> >> machines. :-) >>> >> >>> >> Rich hopes to look into fixing the openib BTL problem today; he >>> >> thinks it's a case of a simple oversight: the openib BTL is not using >>> >> the new freelist init functions. >>> >> >>> >> Rich: are there other places that are not using the new init >>> >> functions that need to? >>> >> >>>>>>> >>>> >> the ompi free list has two init functions, I changed just one. The IB >>>>> >>>> btl uses the >>>>>>> >>>> >> one I have not yet changed, but the pml uses the one I did change. >>> >> >>>>>>> >>>> >> rich >>> >> >>> >> -- >>> >> Jeff Squyres >>> >> Cisco Systems >>> >> >>> >> _______________________________________________ >>> >> devel mailing list >>> >> de...@open-mpi.org >>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> > >> > >> > >> > _______________________________________________ >> > devel mailing list >> > de...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel