Ping.

  Any info on this?

  Regards,

  Sébastien.

On Thu, 13 Feb 2014 09:58:35 +0100
Sébastien Dugué <[email protected]> wrote:

>   Hi,
> 
>   I'm currently running tests with a Connect-IB board under the current 
> OFED-3.12 of the day:
> 
>   - compat:   407b205 compat: Add kthread support for kernels <= 2.6.35
>   - compat-rdma: b2bda9f Fixed nfsrdma backport patch name
>   - linux-3.12:       f9e9918 Prepare Linux tree for OFED 3.12
> 
>   
>   the board is:
> 
> # mstflint -d mlx5_0 q
> 
> -W- Running quick query - Skipping full image integrity checks.
> 
> Image type:      FS3
> FW Version:      10.10.2000
> Device ID:       4113
> Chip Revision:   0
> Description:     UID                GuidsNumber  Step
> Base GUID1:      f4521403000bf580        8        1
> Base GUID2:      f4521403000bf588        8        1
> Base MAC1:       0000f452140bf580        8        1
> Base MAC2:       0000f452140bf588        8        1
> Image VSD:       
> Device VSD:      
> PSID:            MT_1220110019
> 
>   When trying to restart the openibd service:
> 
> # service openibd restart
> 
>   here is what I get:
> 
> INFO: task rmmod:22654 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> rmmod         D 0000000000000001     0 22654  22653 0x00000000
>  ffff88106f1b7b58 0000000000000082 0000000000000000 ffffffff81055f76
>  ffff88106f1b7ae8 ffff88107b0bb500 ffff88106f1b7ae8 ffffffff810522fd
>  ffff88107a8e9af8 ffff88106f1b7fd8 000000000000fb88 ffff88107a8e9af8
> Call Trace:
>  [<ffffffff81055f76>] ? enqueue_task+0x66/0x80
>  [<ffffffff810522fd>] ? check_preempt_curr+0x6d/0x90
>  [<ffffffff8150e555>] schedule_timeout+0x215/0x2e0
>  [<ffffffff81096c96>] ? autoremove_wake_function+0x16/0x40
>  [<ffffffff81051419>] ? __wake_up_common+0x59/0x90
>  [<ffffffff8150e1d3>] wait_for_common+0x123/0x180
>  [<ffffffff81063310>] ? default_wake_function+0x0/0x20
>  [<ffffffff810912b1>] ? __queue_work+0x41/0x50
>  [<ffffffff8150e2ed>] wait_for_completion+0x1d/0x20
>  [<ffffffffa05a3d18>] mlx5_cmd_exec+0x2d8/0x790 [mlx5_core]
>  [<ffffffffa05a583e>] mlx5_cmd_teardown_hca+0x5e/0x90 [mlx5_core]
>  [<ffffffffa05a10f9>] mlx5_dev_cleanup+0x69/0xe0 [mlx5_core]
>  [<ffffffffa05da3c9>] remove_one+0x59/0x70 [mlx5_ib]
>  [<ffffffff8129a047>] pci_device_remove+0x37/0x70
>  [<ffffffff8135e8bf>] __device_release_driver+0x6f/0xe0
>  [<ffffffff8135e9f8>] driver_detach+0xc8/0xd0
>  [<ffffffff8135d7fe>] bus_remove_driver+0x8e/0x110
>  [<ffffffff8135f1e2>] driver_unregister+0x62/0xa0
>  [<ffffffff8129a354>] pci_unregister_driver+0x44/0xb0
>  [<ffffffffa05e7349>] __exit_compat+0x15/0xbe [mlx5_ib]
>  [<ffffffff810b4814>] sys_delete_module+0x194/0x260
>  [<ffffffff8151311e>] ? do_page_fault+0x3e/0xa0
>  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
> 0000:01:00.0:wait_func:618:(pid 22654): TEARDOWN_HCA(0x103) timeout. Will 
> cause a leak of a command resource
> 0000:01:00.0:mlx5_reclaim_startup_pages:419:(pid 22654): FW did not return 
> all pages. giving up...
> 0000:01:00.0:wait_func:618:(pid 22654): MLX5_CMD_OP_DISABLE_HCA(0x105) 
> timeout. Will cause a leak of a command resource
> Compat-rdma backport release: 435a602-c
> Backport based on linux-3.12 385a572
> compat.git: linux-3.12
> mlx5_ib: Mellanox Connect-IB Infiniband driver v1.0 (June 2013)
> mlx5_ib 0000:01:00.0: firmware version: 10.10.2000
> 0000:01:00.0:wait_func:618:(pid 25331): MLX5_CMD_OP_ENABLE_HCA(0x104) 
> timeout. Will cause a leak of a command resource
> mlx5_ib 0000:01:00.0: enable hca failed
> mlx5_ib: probe of 0000:01:00.0 failed with error -110
> 
> 
>   It looks like the driver fails to tear down the HCA, leaving the device in 
> a completely
> unstable state needing a reboot.
> 
>   This behaviour is fully reproductible, although it _may_ succeed once or 
> twice right
> after boot.
> 
>   Is this a FW problem, a driver problem?
> 
>   thanks,
> 
>   Sébastien.
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to