HI George,

Thanks for the feedback.

This PR was only to address one piece ( a first step) for ways to handle
thread based progression
for RDMA capable nics within OMPI.  It by no means represents a complete
solution.  That more complete
solution was what I understood we were interested in in the long term, the
one that would certainly
need an RFC for, since it could involve things like extending the PML
interface, and using
these extensions in parts of libnbc, etc.

I would have liked to use the priority flag except that its use is
different from what we would
want for only generating IRQs for the rendezvous control messages, namely
this part:

/* try to get a small message out on to the wire quickly */

static inline int mca_pml_ob1_send_inline (void *buf, size_t count,

so this would be doing exactly what we want to avoid, namely generating
interrupts for
small eager messages, at least for the typical case.

Its somewhat unfortunate the way the PML/BML/BTL is currently designed,
that the PML is
in charge of how rendezvous messages are transferred for RDMA capable
devices.  If
the approach were more like that taken in the MPICH nemesis ch3 device,
wherein a Nemesis netmod is fully responsible for sending the data using
its own algorithms,
with the option to leverage the LMT framework Darius Buntinas designed,
less "knowledge" of
how a given RDMA device works would have to have been incorporated into the
PML.
With the Nemesis design, I was able to hide a lot about the Cray XE/XC
network from
Nemesis - including the thread based progression part,  the particulars of
memory registration
for the Cray network, etc.  With the Nemesis design, there
was also no need for the base_descriptor back/forth between a Nemesis netmod
and the CH3 device layers that occurs in the PML/BML/BTL design for
send/recv
style data transfers.

Nathan and I are planning to use the SIGNAL flag, as well as some
additional glue,
to add an option for thread based progression in the vader BTL,  but other
projects
have higher priority at the moment.

We can reuse the SIGNAL DES FLAG if we find that extending the PML
interface to
include something with a signal concept is appropriate.  The one area I
know that
might benefit from such a concept is in non-blocking collectives.  But it
may turn
out to be easier just to reuse the libnbc in another coll component which
would be
aware of particular RDMA networks' capabilities, and avoid having to extend
the PML
interface with unnecessary methods.

Hope this helps,

Howard



2015-01-09 15:30 GMT-07:00 George Bosilca <bosi...@icl.utk.edu>:

> I have some comments about this ticket and the corresponding patch.
> Honestly, the patch lacks most of the things we have talked about during
> our last developers meeting. However, my main concern in this particular
> email is about the SIGNAL flag.
>
> 1. The fact that currently there is little difference between this flag
> and PRIORITY is a fact that I would like to hear a justification for.
>
> 2. Moreover, right now SIGNAL is a purely PML decision. Again, we talked
> about this and decided that the upper layer (this meant whoever is using
> the PML) was to define this policy. We specifically said that this should
> not be a PML level decision, because the PML lacks the means to take the
> right decision about what should be signaled and what not. The current code
> signals most of the PML control logic, including some of the matching logic
> (but not all for some obscure reason). Based on my understanding of the
> code, one didn't need to pollute the PML code for this, it could have just
> used the PRIORITY flag instead.
>
> Additionally, if my memory is good we decided that this should be
> thoughtfully evaluated before pushing it into the trunk. And here
> thoughtfully meant over multiple BTL and so on. Obviously, I missed the
> email thread about the evaluation of this flag on UGNI. I guess I might not
> be the only one, so I would really appreciate if someone can repost it
> again.
>
>   George.
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16774.php
>

Reply via email to