OK, thanks for the clearification. When can we test the code via OFED ?
--CQ > -----Original Message----- > From: Ishai Rabinovitz [mailto:[EMAIL PROTECTED] > Sent: Thursday, January 03, 2008 9:55 AM > To: Tang, Changqing; [EMAIL PROTECTED]; Jack > Morgenstein; Pavel Shamis > Cc: Gleb Natapov; Roland Dreier; [email protected] > Subject: RE: [ofa-general] [RFC] XRC -- make receiving XRC QP > independent of any one user process > > CQ, You are right. > > And there is no race because the register and deregister are > locked in the kernel using the same spin lock. > > So in the MPI implementation, when C finds out that the QP is > no longer valid, he should send a reject back to A, and then > A ask C to open also a new QP. > > Ishai > > > -----Original Message----- > > From: Tang, Changqing [mailto:[EMAIL PROTECTED] > > Sent: ה 03 ינואר 2008 17:49 > > To: Ishai Rabinovitz; [EMAIL PROTECTED]; Jack Morgenstein; > > Pavel Shamis > > Cc: Gleb Natapov; Roland Dreier; [email protected] > > Subject: RE: [ofa-general] [RFC] XRC -- make receiving XRC QP > > independent of any one user process > > > > > > Thanks for the comment. > > > > Another issue I have after thinking about the interface more. > > > > Rank A is the sender, rank B and C are two ranks on a > remote node. At > > first, B creates the receiving QP and make connection to A and > > register the QP number for receiving. And A gets the receiving QP > > nubmer from B. After some communication between A and B, B > decides to > > close the connection, and unregister the QP number. Then A > and C want > > to talk, so A tell C the receiving QP number, C tries to > register the > > QP number. > > > > I wonder at the time when C tries to register the QP number, the > > receiving QP has been destroyed by the kernel, since when B > unregister > > the QP number, the reference count becomes zero, and kernel will > > cleanup it. > > > > Am I right ? > > > > > > --CQ > > > > > > > > > -----Original Message----- > > > From: Ishai Rabinovitz [mailto:[EMAIL PROTECTED] > > > Sent: Thursday, January 03, 2008 2:59 AM > > > To: [EMAIL PROTECTED]; Tang, Changqing; Jack > > Morgenstein; Pavel > > > Shamis > > > Cc: Gleb Natapov; Roland Dreier; [email protected] > > > Subject: RE: [ofa-general] [RFC] XRC -- make receiving XRC QP > > > independent of any one user process > > > > > > Please see my comments (prefix [Ishai]) > > > > > > -----Original Message----- > > > From: Tang, Changqing [mailto:[EMAIL PROTECTED] > > > Sent: ד 02 ינואר 2008 17:27 > > > To: Jack Morgenstein; Pavel Shamis > > > Cc: Ishai Rabinovitz; Gleb Natapov; Roland Dreier; > > > [email protected] > > > Subject: RE: [ofa-general] [RFC] XRC -- make receiving XRC QP > > > independent of any one user process > > > > > > > > > This interface is OK for me. > > > > > > Now, every rank on a node who wants to receive message from > > the same > > > remote rank must know the same receiving QP number, and > > register for > > > receiving using this QP number. > > > > > > If rank B does not register (receiving QP has been created > > by another > > > rank A on the node), and sender know B's SRQ number, if > > sender sends a > > > message to B, can B still receive this > > > message ? (I hope, no register, no receive) > > > > > > [Ishai] I guess that from the MPI layer prospective, the > sender can > > > not know B's SRQ number until it ask B to give it to him. > So B can > > > register to this QP before sending the SRQ number. > > > > > > I hope to know the opinion from other MPI team, or other XRC user. > > > > > > [Ishai] We already discussed this issues with Open MPI IB > > group, and > > > it looks fine to them. I'm sending this mail to Prof. > > Panda, so he can > > > comment on it as well. > > > > > > --CQ > > > > > > > > > > > > > -----Original Message----- > > > > From: Jack Morgenstein [mailto:[EMAIL PROTECTED] > > > > Sent: Monday, December 31, 2007 5:40 AM > > > > To: [EMAIL PROTECTED] > > > > Cc: [EMAIL PROTECTED]; Gleb Natapov; Roland Dreier; Tang, > > > > Changqing; [email protected] > > > > Subject: Re: [ofa-general] [RFC] XRC -- make receiving XRC QP > > > > independent of any one user process > > > > > > > > > Tang, Changqing wrote: > > > > > > If I have a MPI server processes on a node, many > > > > other MPI > > > > > > client processes will dynamically > connect/disconnect with the > > > > > > server. The server use same XRC domain. > > > > > > > > > > > > Will this cause accumulating the "kernel" > QP for such > > > > > > application ? we want the server to run 365 days a year. > > > > > > > > > > > > I have some question about the scenario above. Did you > > > > call for the > > > > > > mpi disconnect on the both ends (server/client) before > > > the client > > > > > > exit (did we must to do it?) > > > > > > > > > > Yes, both ends will call disconnect. But for us, > > > > MPI_Comm_disconnect() > > > > > call is not a collective call, it is just a local operation. > > > > > > > > > > --CQ > > > > > > > > > Possible solution (internal review as yet): > > > > > > > > Each user process registers with the XRC QP: > > > > a. each process registers ONCE. If it registers > > multiple times, > > > > there is no reference increment -- > > > > rather the registration succeeds, but only one PID > > entry is > > > > kept per QP. > > > > b. Can have cleanup in the event of a process dying > suddenly. > > > > c. QP cannot be destroyed while there are any user > > > processes still > > > > registered with it. > > > > > > > > libibverbs API is as follows: > > > > > > > > ============================================================== > > > > ======================== > > > > /** > > > > * ibv_xrc_rcv_qp_alloc - creates an XRC QP for serving as a > > > > receive-side only QP, > > > > * and moves the created qp through the RESET->INIT and > > > > INIT->RTR transitions. > > > > * (The RTR->RTS transition is not needed, since this QP > > > > does no sending). > > > > * The sending XRC QP uses this QP as destination, while > > > > specifying an XRC SRQ > > > > * for actually receiving the transmissions and > > > > generating all completions on the > > > > * receiving side. > > > > * > > > > * This QP is created in kernel space, and persists > > > > until the last process registered > > > > * for the QP calls ibv_xrc_rcv_qp_unregister() (at > > > > which time the QP is destroyed). > > > > * > > > > * @pd: protection domain to use. At lower layer, this > provides > > > > access to userspace obj > > > > * @xrc_domain: xrc domain to use for the QP. > > > > * @attr: modify-qp attributes needed to bring the QP to RTR. > > > > * @attr_mask: bitmap indicating which attributes are > > > provided in the > > > > attr struct. > > > > * used for validity checking. > > > > * @xrc_rcv_qpn: qp_num of created QP (if success). To be > > passed to > > > > the remote node (sender). > > > > * The remote node will use xrc_rcv_qpn in > > > > ibv_post_send when sending to > > > > * XRC SRQ's on this host in the same xrc domain. > > > > * > > > > * RETURNS: success (0), or a (negative) error value. > > > > * > > > > * NOTE: this verb also registers the calling user-process > > > with the QP > > > > at its creation time > > > > * (implicit call to ibv_xrc_rcv_qp_register), to avoid > > > > race conditions. > > > > * The creating process will need to call > > > > ibv_xrc_qp_unregister() for the QP to release it from > > > > * this process. > > > > */ > > > > > > > > int ibv_xrc_rcv_qp_alloc(struct ibv_pd *pd, > > > > struct ibv_xrc_domain *xrc_domain, > > > > struct ibv_qp_attr *attr, > > > > enum ibv_qp_attr_mask attr_mask, > > > > uint32_t *xrc_rcv_qpn); > > > > > > > > > > > > > > ===================================================================== > > > > > > > > /** > > > > * ibv_xrc_rcv_qp_register: registers a user process with > > an XRC QP > > > > which serves as > > > > * a receive-side only QP. > > > > * > > > > * @xrc_domain: xrc domain the QP belongs to (for verification). > > > > * @xrc_qp_num: The (24 bit) number of the XRC QP. > > > > * > > > > * RETURNS: success (0), > > > > * or error (-EINVAL), if: > > > > * 1. There is no such QP_num allocated. > > > > * 2. The QP is allocated, but is not an > receive XRC QP > > > > * 3. The XRC QP does not belong to the given domain. > > > > */ > > > > int ibv_xrc_rcv_qp_register(struct ibv_xrc_domain *xrc_domain, > > > > uint32_t xrc_qp_num); > > > > > > > > > > > > > > ===================================================================== > > > > /** > > > > * ibv_xrc_rcv_qp_unregister: detaches a user process from > > > an XRC QP > > > > serving as > > > > * a receive-side only QP. If as a result, there are > > > > no remaining userspace processes > > > > * registered for this XRC QP, it is destroyed. > > > > * > > > > * @xrc_domain: xrc domain the QP belongs to (for verification). > > > > * @xrc_qp_num: The (24 bit) number of the XRC QP. > > > > * > > > > * RETURNS: success (0), > > > > * or error (-EINVAL), if: > > > > * 1. There is no such QP_num allocated. > > > > * 2. The QP is allocated, but is not an XRC QP > > > > * 3. The XRC QP does not belong to the given domain. > > > > * NOTE: I don't see any reason to return a special code if > > > the QP is > > > > destroyed -- the unregister simply > > > > * succeeds. > > > > */ > > > > int ibv_xrc_rcv_qp_unregister(struct ibv_xrc_domain > *xrc_domain, > > > > uint32_t xrc_qp_num); > > > > ============================================================== > > > > =============================== > > > > > > > > Usage: > > > > > > > > 1. Sender creates an XRC QP (sending QP) 2. Sender sends some > > > > receiving process on a remote node (say R1) a request to > > provide an > > > > XRC QP and XRC SRQ for > > > > receiving messages (the request includes the sending > > QP number). > > > > 3. R1 calls ibv_xrc_rcv_qp_alloc() to create a receiving > > XRC QP in > > > > kernel space, and move > > > > that QP up to RTR state. This function also registers > > process R1 > > > > with the XRC QP. > > > > 4. R1 calls ibv_create_xrc_srq() to create an SRQ for > > > receive messages > > > > via the just created XRC QP. > > > > 5. R1 responds to request, providing the XRC qp number, > > and XRC SRQ > > > > number to be used in communication. > > > > 6. Sender then may wish to communicate with another > > > receiving process > > > > on the remote host (say R2). > > > > it sends a request to R2 containing the remote XRC QP number > > > > (obtained from R1) > > > > which it will use to send messages. > > > > 7. R2 creates an XRC SRQ (if one does not already exist for the > > > > domain), and also > > > > calls ibv_xrc_rcv_qp_register() to register the process > > > R2 with the > > > > XRC QP created by R1. > > > > 8. If R1 no longer needs to communicate with the > sender, it calls > > > > ibv_xrc_rcv_qp_unregister() for the QP. > > > > The QP will not yet be destroyed, since R2 is still > > > registered with > > > > it. > > > > 9. If R2 no longer needs to communicate with the > sender, it calls > > > > ibv_xrc_rcv_qp_unregister() for the QP. > > > > At this point, the QP is destroyed, since no > processes remain > > > > registered with it. > > > > > > > > NOTES: > > > > 1. The problem of the QP being destroyed and quickly > > > re-allocated does > > > > not exist -- the upper bits of the > > > > QP number are incremented at each allocation (except > > for the MSB > > > > which is always 1 for XRC QPs). Thus, > > > > even if the same QP is re-allocated, its QP number > > > (stored in the > > > > QP object) will be different than > > > > expected (unless it is re-destroyed/re-allocated > > several hundred > > > > times). > > > > > > > > 2. With this model, we do not need a heartbeat: if a > > > receiving process > > > > dies, all XRC QPs it has registered for will > > > > be unregistered as part of process cleanup in kernel space. > > > > > > > > - Jack > > > > > > > > > > > > > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
