Sigh...is it really so much to ask that we at least run the tests in
orte/test/system and orte/test/mpi using both mpirun and singleton (where
appropriate) instead of just relying on "well I ran hello_world"?

That is all I have ever asked, yet it seems to be viewed as a huge
impediment. Is it really that much to ask for when modifying a core part of
the system? :-/

If you have done those tests, then my apology - but your note only indicates
that you ran "hello_world" and are basing your recommendation *solely* on
that test.


On 6/6/07 7:51 AM, "Tim Prins" <tpr...@open-mpi.org> wrote:

> I hate to go back to this, but...
> 
> The original commits also included changes to gpr_replica_dict_fn.c
> (r14331 and r14336). This change shows some performance improvement for
> me (about %8 on mpi hello, 123 nodes, 4ppn), and cleans up some ugliness
> in the gpr. Again, this is a algorithmic change so as the job scales the
> performance improvement would be more noticeable.
> 
> I vote that this be put back in.
> 
> On a related topic, a small memory leak was fixed in r14328, and then
> reverted. This change should be put back in.
> 
> Tim
> 
> George Bosilca wrote:
>> Commit r14791 apply this patch to the trunk. Let me know if you
>> encounter any kind of troubles.
>> 
>>   Thanks,
>>     george.
>> 
>> On May 29, 2007, at 2:28 PM, Ralph Castain wrote:
>> 
>>> After some work off-list with Tim, it appears that something has been
>>> broken
>>> again on the OMPI trunk with respect to comm_spawn. It was working
>>> two weeks
>>> ago, but...sigh.
>>> 
>>> Anyway, it doesn't appear to have any bearing either way on George's
>>> patch(es), so whomever wants to commit them is welcome to do so.
>>> 
>>> Thanks
>>> Ralph
>>> 
>>> 
>>> On 5/29/07 11:44 AM, "Ralph Castain" <r...@lanl.gov> wrote:
>>> 
>>>> 
>>>> 
>>>> 
>>>> On 5/29/07 11:02 AM, "Tim Prins" <tpr...@open-mpi.org> wrote:
>>>> 
>>>>> Well, after fixing many of the tests...
>>>> 
>>>> Interesting - they worked fine for me. Perhaps a difference in
>>>> environment.
>>>> 
>>>>> It passes all the tests
>>>>> except the spawn tests. However, the spawn tests are seriously broken
>>>>> without this patch as well, and the ibm mpi spawn tests seem to work
>>>>> fine.
>>>> 
>>>> Then something is seriously wrong. The spawn tests were working as
>>>> of my
>>>> last commit - that is a test I religiously run. If the spawn test here
>>>> doesn't work, then it is hard to understand how the mpi spawn can
>>>> work since
>>>> the call is identical.
>>>> 
>>>> Let me see what's wrong first...
>>>> 
>>>>> 
>>>>> As far as I'm concerned, this should assuage any fear of problems
>>>>> with these changes and they should now go in.
>>>>> 
>>>>> Tim
>>>>> 
>>>>> On May 29, 2007, at 11:34 AM, Ralph Castain wrote:
>>>>> 
>>>>>> Well, I'll be the voice of caution again...
>>>>>> 
>>>>>> Tim: did you run all of the orte tests in the orte/test/system
>>>>>> directory? If
>>>>>> so, and they all run correctly, then I have no issue with doing the
>>>>>> commit.
>>>>>> If not, then I would ask that we not do the commit until that has
>>>>>> been done.
>>>>>> 
>>>>>> In running those tests, you need to run them on a multi-node
>>>>>> system, both
>>>>>> using mpirun and as singletons (you'll have to look at the tests to
>>>>>> see
>>>>>> which ones make sense in the latter case). This will ensure that we
>>>>>> have at
>>>>>> least some degree of coverage.
>>>>>> 
>>>>>> Thanks
>>>>>> Ralph
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 5/29/07 9:23 AM, "George Bosilca" <bosi...@cs.utk.edu> wrote:
>>>>>> 
>>>>>>> I'd be happy to commit the patch into the trunk. But after what
>>>>>>> happened last time, I'm more than cautious. If the community think
>>>>>>> the patch is worth having it, let me know and I'll push it in the
>>>>>>> trunk asap.
>>>>>>> 
>>>>>>>    Thanks,
>>>>>>>      george.
>>>>>>> 
>>>>>>> On May 29, 2007, at 10:56 AM, Tim Prins wrote:
>>>>>>> 
>>>>>>>> I think both patches should be put in immediately. I have done some
>>>>>>>> simple testing, and with 128 nodes of odin, with 1024 processes
>>>>>>>> running mpi hello, these decrease our running time from about 14.2
>>>>>>>> seconds to 10.9 seconds. This is a significant decrease, and as the
>>>>>>>> scale increases there should be increasing benefit.
>>>>>>>> 
>>>>>>>> I'd be happy to commit these changes if no one objects.
>>>>>>>> 
>>>>>>>> Tim
>>>>>>>> 
>>>>>>>> On May 24, 2007, at 8:39 AM, Ralph H Castain wrote:
>>>>>>>> 
>>>>>>>>> Thanks - I'll take a look at this (and the prior ones!) in the
>>>>>>>>> next
>>>>>>>>> couple
>>>>>>>>> of weeks when time permits and get back to you.
>>>>>>>>> 
>>>>>>>>> Ralph
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 5/23/07 1:11 PM, "George Bosilca" <bosi...@cs.utk.edu> wrote:
>>>>>>>>> 
>>>>>>>>>> Attached is another patch to the ORTE layer, more specifically
>>>>>>>>>> the
>>>>>>>>>> replica. The idea is to decrease the number of strcmp by using a
>>>>>>>>>> small hash function before doing the strcmp. The hask key for
>>>>>>>>>> each
>>>>>>>>>> registry entry is computed when it is added to the registry. When
>>>>>>>>>> we're doing a query, instead of comparing the 2 strings we first
>>>>>>>>>> check if the hash key match, and if they do match then we compare
>>>>>>>>>> the
>>>>>>>>>> 2 strings in order to make sure we eliminate collisions from our
>>>>>>>>>> answers.
>>>>>>>>>> 
>>>>>>>>>> There is some benefit in terms of performance. It's hardly
>>>>>>>>>> visible
>>>>>>>>>> for few processes, but it start showing up when the number of
>>>>>>>>>> processes increase. In fact the number of strcmp in the trace
>>>>>>>>>> file
>>>>>>>>>> drastically decrease. The main reason it works well, is because
>>>>>>>>>> most
>>>>>>>>>> of the keys start with basically the same chars (such as orte-
>>>>>>>>>> blahblah) which transform the strcmp on a loop over few chars.
>>>>>>>>>> 
>>>>>>>>>> Ralph, please consider it for inclusion on the ORTE layer.
>>>>>>>>>> 
>>>>>>>>>>    Thanks,
>>>>>>>>>>      george.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> de...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> de...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> de...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> ------------------------------------------------------------------------
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to