> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c > index 35cced2..0fa4f72 100644 > --- a/drivers/infiniband/hw/mlx4/qp.c > +++ b/drivers/infiniband/hw/mlx4/qp.c > @@ -2216,6 +2216,9 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct > ib_send_wr *wr, > __be32 blh; > int i; > > + if (pci_channel_offline(to_mdev(ibqp->device)->dev->pdev)) > + return -EIO; > + > spin_lock_irqsave(&qp->sq.lock, flags); > > ind = qp->sq_next_wqe;
To pile on to what Or and Jack asked, why here? Why not in post_recv? Why not in mlx4_en? What about userspace consumers? What if the error condition triggers just after the pci_channel_offline() check? What if a command is queued but a PCI error occurs before the completion can be returned? Is there some practical scenario where this change makes a difference? I would assume that in case of a PCI error, the driver would notice a catastrophic error and send that asynchronous event to consumers, who would know that commands might have been lost. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
