Rich was referring to the fact that the reordering of fragments other than the matching ones is irrelevant to the Gleb's change. In order to trigger the changes we need to force a lot of small unexpected messages over multiple networks. The testing environment should have multiple similar networks (to make sure the matching fragment is distributed evenly across them), and the test should generate a lot of unexpected messages. I think the flood test is a good base for this.

  Thanks,
    george.


On Dec 12, 2007, at 5:04 PM, Jeff Squyres wrote:

Was Rich referring to ensuring that the test codes checked that their
payloads were correct (and not re-assembled in the wrong order)?


On Dec 12, 2007, at 4:10 PM, Brian W. Barrett wrote:

On Wed, 12 Dec 2007, Gleb Natapov wrote:

On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote:
This is better than nothing, but really not very helpful for
looking at the
specific issues that can arise with this, unless these systems
have several
parallel networks, with tests that will generate a lot of parallel
network
traffic, and be able to self check for out-of-order received -
i.e. this
needs to be encoded into the payload for verification purposes.
There are
some out-of-order scenarios that need to be generated and
checked.  I think
that George may have a system that will be good for this sort of
testing.

I am running various test with multiple networks right now. I use
several IB BTLs and TCP BTL simultaneously. I see many reordered
messages and all tests were OK till now, but they don't encode
message sequence in a payload as far as I know. I'll change one of
them to do so.

Other than Rich's comment that we need sequence numbers, why add
them?  We
haven't had them for non-matching packets for the last 3 years in
Open MPI
(ie, forever), and I can't see why we would need them.  Yes, we need
sequence numbers for match headers to make sure MPI ordering is
correct.
But for the rest of the payload, there's no need with OMPI's datatype
engine.  It's just more payload for no gain.

Brian
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to