> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
> index 35cced2..0fa4f72 100644
> --- a/drivers/infiniband/hw/mlx4/qp.c
> +++ b/drivers/infiniband/hw/mlx4/qp.c
> @@ -2216,6 +2216,9 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct 
> ib_send_wr *wr,
>         __be32 blh;
>         int i;
>
> +       if (pci_channel_offline(to_mdev(ibqp->device)->dev->pdev))
> +               return -EIO;
> +
>         spin_lock_irqsave(&qp->sq.lock, flags);
>
>         ind = qp->sq_next_wqe;

To pile on to what Or and Jack asked, why here?  Why not in post_recv?
 Why not in mlx4_en?  What about userspace consumers?  What if the
error condition triggers just after the pci_channel_offline() check?
What if a command is queued but a PCI error occurs before the
completion can be returned?

Is there some practical scenario where this change makes a difference?

I would assume that in case of a PCI error, the driver would notice a
catastrophic error and send that asynchronous event to consumers, who
would know that commands might have been lost.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to