On Jun 7, 2007, at 9:11 AM, George Bosilca wrote:

There is something weird with this change, and the patch reflect it. The new argument "order" come from the PML level and might be MCA_BTL_NO_ORDER (which is kind of global) or BTL_OPENIB_LP_QP or BTL_OPENIB_HP_QP (which are definitively Open IB related). Do you really intend to let the PML knows about Open IB internal constants ?

No, the PML knows only one thing about the order tag, it is either MCA_BTL_NO_ORDER or it is something that the BTL assigns. The PML has no idea about BTL_OPENIB_LP_QP or BTL_OPENIB_HP_QP, to the PML it is just an order tag assigned to a fragment by the BTL.

So the semantics are that after a btl_send/put/get an order tag may be assigned by the BTL to the descriptor, This order tag can then be specified to subsequent calls to btl_alloc or btl_prepare. The PML has no idea what the value means, other than he is requesting a descriptor that will be ordered w.r.t. a previously transmitted descriptor.


If it's the case (which seems to be true from the following snippet
    if(MCA_BTL_NO_ORDER == order) {
        frag->base.order = BTL_OPENIB_LP_QP;
    } else {
        frag->base.order = order;
    }
So I am choosing some ordering to use here because the PML told me he doesn't care, what is wrong with this?



) I expect you to revise the patch in order to propose a generic solution or I'll trigger a vote against the patch.
This exports no knowledge of the Open IB BTL to the PML layer, the PML doesn't know that this is a QP index, he doesn't care! The PML simply uses this value (if it wants to) to request ordering with subsequent fragments. We use the QP index only as a BTL optimization, it could have been anything. So the only new knowledge that the PML has is how to request that ordering of fragments be enforced, and the BTL doesn't even have to provide this if it doesn't want, that is the reason for MCA_BTL_NO_ORDER.


Please describe a use case where this is not a generic solution. Keep in mind that MX, TCP, GM all can provide ordering guarantees if they wish, in fact for MX you can simply always assign an order tag, say the value is 1. MX can then guarantee ordering of all fragments sent over the same BTL.


I vote to be backed out of the trunk as it export way to much knowledge from the Open IB BTL into the PML layer.

The only other option that I have identified that doesn't push PML level protocol into the BTL is to require that BTLs always guarantee ordering of fragments sent/put/get over the same BTL.



  george.

PS: With Gleb changes the problem is the same. The following snippet reflect exactly the same behavior as the original patch.

Gleb's changes don't change the semantic guarantees that I have described above.




frag->base.order = order;
assert(frag->base.order != BTL_OPENIB_HP_QP);

On Jun 7, 2007, at 9:49 AM, Gleb Natapov wrote:

Hi Galen,

On Sun, May 27, 2007 at 10:19:09AM -0600, Galen Shipman wrote:

With current code this is not the case. Order tag is set during a
fragment
allocation. It seems wrong according to your description. Attached
patch fixes
this. If no specific ordering tag is provided to allocation
function order of
the fragment is set to be MCA_BTL_NO_ORDER. After call to send/put/
get order
is set to whatever QP was used for communication. If order is set
before send call
it is used to choose QP.


I do set the order tag during allocation/prepare, but the defined
semantics are that the tag is only valid after send/put/get. We can
set them up any where we wish in the BTL, the PML however cannot rely
on anything until after the send/put/get call. So really this is an
issue of semantics versus implementation. The implementation I
believe does conform to the semantics as the upper layer (PML)
doesn't use the tag value until after a call to send/put/get.

I will look over the patch however, might make more sense to delay
setting the value until the actual send/put/get call.

Have you had a chance to look over the patch?

--
                        Gleb.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to