Sigh...is it really so much to ask that we at least run the tests in orte/test/system and orte/test/mpi using both mpirun and singleton (where appropriate) instead of just relying on "well I ran hello_world"?
That is all I have ever asked, yet it seems to be viewed as a huge impediment. Is it really that much to ask for when modifying a core part of the system? :-/ If you have done those tests, then my apology - but your note only indicates that you ran "hello_world" and are basing your recommendation *solely* on that test. On 6/6/07 7:51 AM, "Tim Prins" <tpr...@open-mpi.org> wrote: > I hate to go back to this, but... > > The original commits also included changes to gpr_replica_dict_fn.c > (r14331 and r14336). This change shows some performance improvement for > me (about %8 on mpi hello, 123 nodes, 4ppn), and cleans up some ugliness > in the gpr. Again, this is a algorithmic change so as the job scales the > performance improvement would be more noticeable. > > I vote that this be put back in. > > On a related topic, a small memory leak was fixed in r14328, and then > reverted. This change should be put back in. > > Tim > > George Bosilca wrote: >> Commit r14791 apply this patch to the trunk. Let me know if you >> encounter any kind of troubles. >> >> Thanks, >> george. >> >> On May 29, 2007, at 2:28 PM, Ralph Castain wrote: >> >>> After some work off-list with Tim, it appears that something has been >>> broken >>> again on the OMPI trunk with respect to comm_spawn. It was working >>> two weeks >>> ago, but...sigh. >>> >>> Anyway, it doesn't appear to have any bearing either way on George's >>> patch(es), so whomever wants to commit them is welcome to do so. >>> >>> Thanks >>> Ralph >>> >>> >>> On 5/29/07 11:44 AM, "Ralph Castain" <r...@lanl.gov> wrote: >>> >>>> >>>> >>>> >>>> On 5/29/07 11:02 AM, "Tim Prins" <tpr...@open-mpi.org> wrote: >>>> >>>>> Well, after fixing many of the tests... >>>> >>>> Interesting - they worked fine for me. Perhaps a difference in >>>> environment. >>>> >>>>> It passes all the tests >>>>> except the spawn tests. However, the spawn tests are seriously broken >>>>> without this patch as well, and the ibm mpi spawn tests seem to work >>>>> fine. >>>> >>>> Then something is seriously wrong. The spawn tests were working as >>>> of my >>>> last commit - that is a test I religiously run. If the spawn test here >>>> doesn't work, then it is hard to understand how the mpi spawn can >>>> work since >>>> the call is identical. >>>> >>>> Let me see what's wrong first... >>>> >>>>> >>>>> As far as I'm concerned, this should assuage any fear of problems >>>>> with these changes and they should now go in. >>>>> >>>>> Tim >>>>> >>>>> On May 29, 2007, at 11:34 AM, Ralph Castain wrote: >>>>> >>>>>> Well, I'll be the voice of caution again... >>>>>> >>>>>> Tim: did you run all of the orte tests in the orte/test/system >>>>>> directory? If >>>>>> so, and they all run correctly, then I have no issue with doing the >>>>>> commit. >>>>>> If not, then I would ask that we not do the commit until that has >>>>>> been done. >>>>>> >>>>>> In running those tests, you need to run them on a multi-node >>>>>> system, both >>>>>> using mpirun and as singletons (you'll have to look at the tests to >>>>>> see >>>>>> which ones make sense in the latter case). This will ensure that we >>>>>> have at >>>>>> least some degree of coverage. >>>>>> >>>>>> Thanks >>>>>> Ralph >>>>>> >>>>>> >>>>>> >>>>>> On 5/29/07 9:23 AM, "George Bosilca" <bosi...@cs.utk.edu> wrote: >>>>>> >>>>>>> I'd be happy to commit the patch into the trunk. But after what >>>>>>> happened last time, I'm more than cautious. If the community think >>>>>>> the patch is worth having it, let me know and I'll push it in the >>>>>>> trunk asap. >>>>>>> >>>>>>> Thanks, >>>>>>> george. >>>>>>> >>>>>>> On May 29, 2007, at 10:56 AM, Tim Prins wrote: >>>>>>> >>>>>>>> I think both patches should be put in immediately. I have done some >>>>>>>> simple testing, and with 128 nodes of odin, with 1024 processes >>>>>>>> running mpi hello, these decrease our running time from about 14.2 >>>>>>>> seconds to 10.9 seconds. This is a significant decrease, and as the >>>>>>>> scale increases there should be increasing benefit. >>>>>>>> >>>>>>>> I'd be happy to commit these changes if no one objects. >>>>>>>> >>>>>>>> Tim >>>>>>>> >>>>>>>> On May 24, 2007, at 8:39 AM, Ralph H Castain wrote: >>>>>>>> >>>>>>>>> Thanks - I'll take a look at this (and the prior ones!) in the >>>>>>>>> next >>>>>>>>> couple >>>>>>>>> of weeks when time permits and get back to you. >>>>>>>>> >>>>>>>>> Ralph >>>>>>>>> >>>>>>>>> >>>>>>>>> On 5/23/07 1:11 PM, "George Bosilca" <bosi...@cs.utk.edu> wrote: >>>>>>>>> >>>>>>>>>> Attached is another patch to the ORTE layer, more specifically >>>>>>>>>> the >>>>>>>>>> replica. The idea is to decrease the number of strcmp by using a >>>>>>>>>> small hash function before doing the strcmp. The hask key for >>>>>>>>>> each >>>>>>>>>> registry entry is computed when it is added to the registry. When >>>>>>>>>> we're doing a query, instead of comparing the 2 strings we first >>>>>>>>>> check if the hash key match, and if they do match then we compare >>>>>>>>>> the >>>>>>>>>> 2 strings in order to make sure we eliminate collisions from our >>>>>>>>>> answers. >>>>>>>>>> >>>>>>>>>> There is some benefit in terms of performance. It's hardly >>>>>>>>>> visible >>>>>>>>>> for few processes, but it start showing up when the number of >>>>>>>>>> processes increase. In fact the number of strcmp in the trace >>>>>>>>>> file >>>>>>>>>> drastically decrease. The main reason it works well, is because >>>>>>>>>> most >>>>>>>>>> of the keys start with basically the same chars (such as orte- >>>>>>>>>> blahblah) which transform the strcmp on a loop over few chars. >>>>>>>>>> >>>>>>>>>> Ralph, please consider it for inclusion on the ORTE layer. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> george. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> devel mailing list >>>>>>>>>> de...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> devel mailing list >>>>>>>>> de...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> de...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel