From: Zhu Yanjun <yanjun....@oracle.com> Date: Sun, 15 Apr 2018 21:02:07 -0400
> While a faulty cable is used or HCA firmware error, HCA device will > be offline. When the driver is accessing this offline device, the > following call trace will pop out. ... > In the above call trace, the function mlx4_cmd_poll calls the function > mlx4_cmd_post to access the HCA while HCA is offline. Then mlx4_cmd_post > returns an error -EIO. Per -EIO, the function mlx4_cmd_poll calls > mlx4_cmd_reset_flow to reset HCA. And the above call trace pops out. > > This is not reasonable. Since HCA device is offline when it is being > accessed, it should not be reset again. > > In this patch, since HCA is offline, the function mlx4_cmd_post returns > an error -EINVAL. Per -EINVAL, the function mlx4_cmd_poll directly returns > instead of resetting HCA. > > CC: Srinivas Eeda <srinivas.e...@oracle.com> > CC: Junxiao Bi <junxiao...@oracle.com> > Suggested-by: HÃ¥kon Bugge <haakon.bu...@oracle.com> > Signed-off-by: Zhu Yanjun <yanjun....@oracle.com> Tariq, I'm assuming you'll take this in and send it to me later. Thanks.