Re: mlx4 missing completions (was Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1)
On Wed, Oct 26, 2011 at 10:06 PM, Roland Dreier rol...@purestorage.com wrote: On Wed, Oct 26, 2011 at 11:05 AM, Bart Van Assche bvanass...@acm.org wrote: Can I conclude from your reply that the last WQE event refers to the SRQ only and that it does not provide any information about the send queue associated with the same QP ? That is correct as I understand things. The last WQE event was added to handle the fact that if you have a QP associated to an SRQ, you can't know how many receives have been taken from the SRQ and are in flight on the receive queue. The IB spec is a bit vague on what last WQE actually means but based on the background, I believe it means that the last request on the receive queue has been processed, without saying anything about the send queue. Are you sure about this ? There is a paragraph in the InfiniBand Architecture Manual that can only be correct if last WQE refers to both SRQ and send queue and not to the SRQ only. It is mentioned in that manual that it is safe to reset the QP immediately after the last WQE event has been received. A quote: Note, for QPs that are associated with an SRQ, the Consumer should take the QP through the Error State before invoking a Destroy QP or a Modify QP to the Reset State. The Consumer may invoke the Destroy QP without first performing a Modify QP to the Error State and waiting for the Affiliated Asynchronous Last WQE Reached Event. However, if the Consumer does not wait for the Affiliated Asynchronous Last WQE Reached Event, then WQE and Data Segment leakage may occur. Therefore, it is good programming practice to tear down a QP that is associated with an SRQ by using the following process: • Put the QP in the Error State; • wait for the Affiliated Asynchronous Last WQE Reached Event; • either: • drain the CQ by invoking the Poll CQ verb and either wait for CQ to be empty or the number of Poll CQ operations has exceeded CQ capacity size; or • post another WR that completes on the same CQ and wait for this WR to return as a WC; • and then invoke a Destroy QP or Reset QP. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1
On Mon, Oct 24, 2011 at 7:57 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: +static int srpt_write_pending(struct se_cmd *se_cmd) +{ + struct srpt_rdma_ch *ch; + struct srpt_send_ioctx *ioctx; + enum srpt_command_state new_state; + enum rdma_ch_state ch_state; + int ret; + + ioctx = container_of(se_cmd, struct srpt_send_ioctx, cmd); + + new_state = srpt_set_cmd_state(ioctx, SRPT_STATE_NEED_DATA); + WARN_ON(new_state == SRPT_STATE_DONE); + + ch = ioctx-ch; + BUG_ON(!ch); + + ch_state = srpt_get_ch_state(ch); + switch (ch_state) { + case CH_CONNECTING: + /* This code should never be reached. */ + __WARN(); + ret = -EINVAL; + goto out; + case CH_LIVE: + break; + case CH_DISCONNECTING: + case CH_DRAINING: + case CH_RELEASING: + pr_debug(cmd with tag %lld: channel disconnecting\n, + ioctx-tag); + srpt_set_cmd_state(ioctx, SRPT_STATE_DATA_IN); + ret = -EINVAL; + goto out; + } + ret = srpt_xfer_data(ch, ioctx); + +out: + return ret; +} If not enough memory is available to perform a SCSI write, this function returns -ENOMEM and the LIO core will report that error to the initiator. At least when mounting a filesystem on top of ib_srp and when performing asynchronous I/O that will result in data loss. Shouldn't the LIO core retry writes as long as not enough memory is available ? A similar argument holds for reads. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1
On Wed, 2011-10-26 at 08:09 +0200, Bart Van Assche wrote: On Mon, Oct 24, 2011 at 7:57 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: +static int srpt_write_pending(struct se_cmd *se_cmd) +{ + struct srpt_rdma_ch *ch; + struct srpt_send_ioctx *ioctx; + enum srpt_command_state new_state; + enum rdma_ch_state ch_state; + int ret; + + ioctx = container_of(se_cmd, struct srpt_send_ioctx, cmd); + + new_state = srpt_set_cmd_state(ioctx, SRPT_STATE_NEED_DATA); + WARN_ON(new_state == SRPT_STATE_DONE); + + ch = ioctx-ch; + BUG_ON(!ch); + + ch_state = srpt_get_ch_state(ch); + switch (ch_state) { + case CH_CONNECTING: + /* This code should never be reached. */ + __WARN(); + ret = -EINVAL; + goto out; + case CH_LIVE: + break; + case CH_DISCONNECTING: + case CH_DRAINING: + case CH_RELEASING: + pr_debug(cmd with tag %lld: channel disconnecting\n, +ioctx-tag); + srpt_set_cmd_state(ioctx, SRPT_STATE_DATA_IN); + ret = -EINVAL; + goto out; + } + ret = srpt_xfer_data(ch, ioctx); + +out: + return ret; +} If not enough memory is available to perform a SCSI write, this function returns -ENOMEM and the LIO core will report that error to the initiator. At least when mounting a filesystem on top of ib_srp and when performing asynchronous I/O that will result in data loss. Shouldn't the LIO core retry writes as long as not enough memory is available ? A similar argument holds for reads. So target_core_fabric_ops-write_pending(), -queue_data_in() and -queue_status() currently check for a -EAGAIN return in order to signal that QUEUE_FULL logic should kick in for the associated se_cmd descriptor. I think it's reasonable to also allow fabric modules to signal QUEUE_FULL using -ENOMEM as well, so I'll go ahead and change this for v3.2 w/ ib_srpt and will send out a patch shortly. Also, I did notice one extra case in srpt_map_sg_to_ib_sge() - srpt_xfer_data() that returns -EBUSY, so i'll change this to -EAGAIN to signal QUEUE_FULL as well. Thanks for the feedback Bart! --nab -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4 missing completions (was Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1)
On Wed, Oct 26, 2011 at 12:04 AM, Roland Dreier rol...@purestorage.com wrote: Sorry, but now I confused about what the bug is. You have a QP associated with an SRQ, and you transition the QP to error. At some point you get a last WQE received event for that QP (which means all receive requests for that QP have completed), and you drain the CQ after that, which means you are guaranteed to have processed all the receive requests for that QP. (At least this is how the verbs are supposed to work). I don't think there is any way to guarantee that every request posted to the SRQ is taken by a QP. You just have to make sure that you tear down every QP as above, and then you know when there are no more QPs associated to the SRQ that any receive requests that haven't completed are still on the SRQ, and you can free their resources after you destroy the SRQ. I've learned something since I posted the message at the start of this thread: all error completions are posted on the CQ but draining the CQ immediately after having received the last WQE event is not sufficient. The CQ polling loop has to remain active a little longer in order to receive all receive and send error completions. Issuing rmmod ib_srpt during I/O works now without delay for SVN trunk r3900. I've had another look at the LIO version of ib_srpt and the way it handles completions should ensure that all error completions get processed (if any such completions are generated of course). Can I conclude from your reply that the last WQE event refers to the SRQ only and that it does not provide any information about the send queue associated with the same QP ? Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1
On Mon, Oct 24, 2011 at 8:33 PM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: It would have been nice if you bothered to mention these in any of the pre merge window reviews, but given your delay responsed on these items they will have to be something we'll fix post merge. Sorry for that - it took some time though before I remembered myself about all this. If you are looking for a test case: pulling an IB cable during I/O a few times should allow to trigger this. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1
On Mon, Oct 24, 2011 at 11:01 PM, Roland Dreier rol...@purestorage.com wrote: On Mon, Oct 24, 2011 at 11:33 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: - Handle IB completion timeouts. Although the InfiniBand Architecture Manual specifies that a HCA must generate an error completion for each pending work item when a queue pair reset is issued, an important class of HCAs doesn't do this (that includes the HCA in your test setup). In the SCST version of this driver such timeouts are handled by the function srpt_pending_cmd_timeout(). Hold on... not sure what you mean by a queue pair reset but the IB spec does not say that a flush error should be generated when a QP is transitioned to the reset state. The flush error completions are required when a QP is transitioned to the error state, and in fact completion entries may be lost when a QP transitions to reset. I was referring to transitioning a QP to the error state -- see also srpt_ch_qp_err(). As far as I know every HCA supported by Linux does implement this correctly. Which class did you have in mind as not doing that? At least QDR ConnectX 2 HCAs with fairly recent firmware. This behavior can be reproduced easily with the SCST version of ib_srpt as follows: - Log in from an SRP initiator to ib_srpt. - Start a direct I/O read test, e.g. with fio. - Issue the command rmmod ib_srpt on the target during I/O. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mlx4 missing completions (was Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1)
On Mon, Oct 24, 2011 at 11:07 PM, Bart Van Assche bvanass...@acm.org wrote: As far as I know every HCA supported by Linux does implement this correctly. Which class did you have in mind as not doing that? At least QDR ConnectX 2 HCAs with fairly recent firmware. This behavior can be reproduced easily with the SCST version of ib_srpt as follows: - Log in from an SRP initiator to ib_srpt. - Start a direct I/O read test, e.g. with fio. - Issue the command rmmod ib_srpt on the target during I/O. OK, this is a pretty serious bug in mlx4_ib if true. Are you sure that you really are seeing some pending work requests not generating a flush error when the QP transitions to the error state? Keep in mind that transitioning the QP to the reset state can cause completion entries to disappear after they are generated. - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4 missing completions (was Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1)
On Tue, Oct 25, 2011 at 1:17 PM, Roland Dreier rol...@purestorage.com wrote: On Mon, Oct 24, 2011 at 11:07 PM, Bart Van Assche bvanass...@acm.org wrote: As far as I know every HCA supported by Linux does implement this correctly. Which class did you have in mind as not doing that? At least QDR ConnectX 2 HCAs with fairly recent firmware. This behavior can be reproduced easily with the SCST version of ib_srpt as follows: 1. Log in from an SRP initiator to ib_srpt. 2. Start a direct I/O read test, e.g. with fio. 3. Issue the command rmmod ib_srpt on the target during I/O. OK, this is a pretty serious bug in mlx4_ib if true. Are you sure that you really are seeing some pending work requests not generating a flush error when the QP transitions to the error state? It's a little more complex than that. The original version of ib_srpt stops polling for completions as soon as the last WQE event has been received and after that the queue has been drained. So I don't know whether these flush errors were not delivered or whether these were delivered too late. Bart. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mlx4 missing completions (was Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1)
On Tue, Oct 25, 2011 at 11:09 AM, Bart Van Assche bvanass...@acm.org wrote: It's a little more complex than that. The original version of ib_srpt stops polling for completions as soon as the last WQE event has been received and after that the queue has been drained. So I don't know whether these flush errors were not delivered or whether these were delivered too late. Sorry, but now I confused about what the bug is. You have a QP associated with an SRQ, and you transition the QP to error. At some point you get a last WQE received event for that QP (which means all receive requests for that QP have completed), and you drain the CQ after that, which means you are guaranteed to have processed all the receive requests for that QP. (At least this is how the verbs are supposed to work). So what goes wrong with ConnectX? I don't think there is any way to guarantee that every request posted to the SRQ is taken by a QP. You just have to make sure that you tear down every QP as above, and then you know when there are no more QPs associated to the SRQ that any receive requests that haven't completed are still on the SRQ, and you can free their resources after you destroy the SRQ. - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-v2] ib_srpt: Initial SRP Target merge for v3.2-rc1
On Mon, Oct 24, 2011 at 11:33 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: - Handle IB completion timeouts. Although the InfiniBand Architecture Manual specifies that a HCA must generate an error completion for each pending work item when a queue pair reset is issued, an important class of HCAs doesn't do this (that includes the HCA in your test setup). In the SCST version of this driver such timeouts are handled by the function srpt_pending_cmd_timeout(). Hold on... not sure what you mean by a queue pair reset but the IB spec does not say that a flush error should be generated when a QP is transitioned to the reset state. The flush error completions are required when a QP is transitioned to the error state, and in fact completion entries may be lost when a QP transitions to reset. As far as I know every HCA supported by Linux does implement this correctly. Which class did you have in mind as not doing that? - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html