Re: [OMPI devel] matching code rewrite in OB1

Richard Graham Thu, 13 Dec 2007 18:17:07 -0500

The situation that needs to be triggered, just as George has mentions, is
where we have a lot of unexpected messages, to make sure that when one that
we can match against comes in, all the unexpected messages that can be
matched with pre-posted receives are matched.  Since we attempt to match
only when a new fragment comes in, we need to make sure that we don't leave
other unexpected messages that can be matched in the unexpected queue, as
these (if the out of order scenario is just right) would block any new
matches from occurring.


For example:  Say the next expect message is 25

Unexpected message queue has:  26 28 29 ..

If 25 comes in, and is handled, if 26 is not pulled off the unexpected
message queue, when 27 comes in it won't be able to be matched, as 26 is
sitting in the unexpected queue, and will never be looked at again ...

Rich


On 12/13/07 2:09 PM, "George Bosilca" <bosi...@eecs.utk.edu> wrote:

> Rich was referring to the fact that the reordering of fragments other
> than the matching ones is irrelevant to the Gleb's change. In order to
> trigger the changes we need to force a lot of small unexpected
> messages over multiple networks. The testing environment should have
> multiple similar networks (to make sure the matching fragment is
> distributed evenly across them), and the test should generate a lot of
> unexpected messages. I think the flood test is a good base for this.
> 
>    Thanks,
>      george.
> 
> 
> On Dec 12, 2007, at 5:04 PM, Jeff Squyres wrote:
> 
>> Was Rich referring to ensuring that the test codes checked that their
>> payloads were correct (and not re-assembled in the wrong order)?
>> 
>> 
>> On Dec 12, 2007, at 4:10 PM, Brian W. Barrett wrote:
>> 
>>> On Wed, 12 Dec 2007, Gleb Natapov wrote:
>>> 
>>>> On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote:
>>>>> This is better than nothing, but really not very helpful for
>>>>> looking at the
>>>>> specific issues that can arise with this, unless these systems
>>>>> have several
>>>>> parallel networks, with tests that will generate a lot of parallel
>>>>> network
>>>>> traffic, and be able to self check for out-of-order received -
>>>>> i.e. this
>>>>> needs to be encoded into the payload for verification purposes.
>>>>> There are
>>>>> some out-of-order scenarios that need to be generated and
>>>>> checked.  I think
>>>>> that George may have a system that will be good for this sort of
>>>>> testing.
>>>>> 
>>>> I am running various test with multiple networks right now. I use
>>>> several IB BTLs and TCP BTL simultaneously. I see many reordered
>>>> messages and all tests were OK till now, but they don't encode
>>>> message sequence in a payload as far as I know. I'll change one of
>>>> them to do so.
>>> 
>>> Other than Rich's comment that we need sequence numbers, why add
>>> them?  We
>>> haven't had them for non-matching packets for the last 3 years in
>>> Open MPI
>>> (ie, forever), and I can't see why we would need them.  Yes, we need
>>> sequence numbers for match headers to make sure MPI ordering is
>>> correct.
>>> But for the rest of the payload, there's no need with OMPI's datatype
>>> engine.  It's just more payload for no gain.
>>> 
>>> Brian
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> -- 
>> Jeff Squyres
>> Cisco Systems
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] matching code rewrite in OB1

Reply via email to