Re: NULL pointer deref in k.o/for-4.5

2016-01-06 Thread Chuck Lever

> On Jan 6, 2016, at 1:16 PM, Chuck Lever  wrote:
> 
> Encountered the below just after booting my NFS/RDMA
> server with 4.4.0-rc6-00011-g6948cb2 (k.o/for-4.5 plus
> my NFS/RDMA for-4.5 patches). The system is up and
> ping-able via eth0, but high-level networking (like sshd
> and nfsd) does not work, and my ib0 i/f is missing.
> 
> This is an x86_64 system with one CX-3 Pro HCA.

And appears to be 100% reproducible. Any debugging
advice welcome!


> All seems well with a stock v4.4-rc4 kernel.
> 
> 
> Jan  6 12:44:13 klimt kernel:  mlx4_ib_add: mlx4_ib: Mellanox 
> ConnectX InfiniBand driver v2.2-1 (Feb 2014)
> Jan  6 12:44:13 klimt kernel:  mlx4_ib_add: counter index 0 for port 
> 1 allocated 0
> Jan  6 12:44:13 klimt kernel: BUG: unable to handle kernel NULL pointer 
> dereference at   (null)
> Jan  6 12:44:13 klimt kernel: IP: [] 
> __mutex_lock_slowpath+0x75/0x120
> Jan  6 12:44:13 klimt kernel: PGD 853947067 PUD 8546cb067 PMD 0 
> Jan  6 12:44:13 klimt kernel: Oops: 0002 [#1] SMP 
> Jan  6 12:44:13 klimt kernel: Modules linked in: mlx4_ib(+) mlx4_en ib_sa 
> ib_mad ib_core vxlan ip6_udp_tunnel udp_tunnel ib_addr sr_mod cdrom sd_mod 
> ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm 
> mlx4_core igb ahci libahci libata ptp pps_core dca i2c_algo_bit i2c_core 
> dm_mirror dm_region_hash dm_log dm_mod
> Jan  6 12:44:13 klimt kernel: CPU: 3 PID: 431 Comm: modprobe Not tainted 
> 4.4.0-rc6-00011-g6948cb2 #79
> Jan  6 12:44:13 klimt kernel: Hardware name: Supermicro Super 
> Server/X10SRL-F, BIOS 1.0c 09/09/2015
> Jan  6 12:44:13 klimt kernel: task: 88085571aa80 ti: 88084f414000 
> task.ti: 88084f414000
> Jan  6 12:44:13 klimt kernel: RIP: 0010:[]  
> [] __mutex_lock_slowpath+0x75/0x120
> Jan  6 12:44:13 klimt kernel: RSP: 0018:88084f417810  EFLAGS: 00010282
> Jan  6 12:44:13 klimt kernel: RAX:  RBX: 88084f633950 
> RCX: 88085571aa80
> Jan  6 12:44:13 klimt kernel: RDX: 0001 RSI: 88085571aae0 
> RDI: 88084f633954
> Jan  6 12:44:13 klimt kernel: RBP: 88084f417858 R08: 0101 
> R09: 880854f02f00
> Jan  6 12:44:13 klimt kernel: R10: a0150a85 R11: ea002156d400 
> R12: 88084f633954
> Jan  6 12:44:13 klimt kernel: R13: 88085571aa80 R14:  
> R15: 88084f633958
> Jan  6 12:44:13 klimt kernel: FS:  7f32227c0740() 
> GS:88087fcc() knlGS:
> Jan  6 12:44:13 klimt kernel: CS:  0010 DS:  ES:  CR0: 
> 80050033
> Jan  6 12:44:13 klimt kernel: CR2:  CR3: 000853cb6000 
> CR4: 001406e0
> Jan  6 12:44:13 klimt kernel: Stack:
> Jan  6 12:44:13 klimt kernel: 88084f633958  
> 81309502 3b473ac0
> Jan  6 12:44:13 klimt kernel: 88084f633950 88084f417888 
> 88084f633940 88084f633950
> Jan  6 12:44:13 klimt kernel: 88084f63 88084f417870 
> 8165271f 88084f63
> Jan  6 12:44:13 klimt kernel: Call Trace:
> Jan  6 12:44:13 klimt kernel: [] ? 
> get_from_free_list+0x42/0x50
> Jan  6 12:44:13 klimt kernel: [] mutex_lock+0x1f/0x2f
> Jan  6 12:44:13 klimt kernel: [] 
> iboe_process_mad.isra.13+0x77/0x190 [mlx4_ib]
> Jan  6 12:44:13 klimt kernel: [] 
> mlx4_ib_process_mad+0x4d4/0x550 [mlx4_ib]
> Jan  6 12:44:13 klimt kernel: [] ? 
> kernfs_next_descendant_post+0x1a/0x50
> Jan  6 12:44:13 klimt kernel: [] ? 
> kernfs_add_one+0x112/0x150
> Jan  6 12:44:13 klimt kernel: [] ? 
> kmem_cache_alloc_trace+0x3d/0x1d0
> Jan  6 12:44:13 klimt kernel: [] ? get_perf_mad+0x85/0x160 
> [ib_core]
> Jan  6 12:44:13 klimt kernel: [] get_perf_mad+0xee/0x160 
> [ib_core]
> Jan  6 12:44:13 klimt kernel: [] 
> get_counter_table+0x38/0x70 [ib_core]
> Jan  6 12:44:13 klimt kernel: [] ? 
> kmem_cache_alloc_trace+0xf8/0x1d0
> Jan  6 12:44:13 klimt kernel: [] ? add_port+0xc2/0x450 
> [ib_core]
> Jan  6 12:44:13 klimt kernel: [] add_port+0x10f/0x450 
> [ib_core]
> Jan  6 12:44:13 klimt kernel: [] 
> ib_device_register_sysfs+0xe8/0x160 [ib_core]
> Jan  6 12:44:13 klimt kernel: [] 
> ib_register_device+0x320/0x500 [ib_core]
> Jan  6 12:44:13 klimt kernel: [] ? vprintk_default+0x3b/0x40
> Jan  6 12:44:13 klimt kernel: [] ? printk+0x5d/0x74
> Jan  6 12:44:13 klimt kernel: [] mlx4_ib_add+0xbb9/0xfe0 
> [mlx4_ib]
> Jan  6 12:44:13 klimt kernel: [] ? 0xa023f000
> Jan  6 12:44:13 klimt kernel: [] mlx4_add_device+0x3f/0xb0 
> [mlx4_core]
> Jan  6 12:44:13 klimt kernel: [] ? 0xa023f000
> Jan  6 12:44:13 klimt kernel: [] 
> mlx4_register_interface+0xd2/0x100 [mlx4_core]
> Jan  6 12:44:13 klimt kernel: [] mlx4_ib_init+0x4c/0x1000 
> [mlx4_ib]
> Jan  6 12:44:13 klimt kernel: [] do_one_initcall+0x113/0x1f0
> Jan  6 12:44:13 klimt kernel: [] ? __vunmap+0xd7/0x100
> Jan  6 12:44:13 klimt kernel: [] ? 
> kmem_cache_alloc_trace+0x3d/0x1d0
> Jan  6 12:44:13 klimt kernel: [] ? do_init_module+0x27/0x1e8
> Jan  6 12:44:13 klimt kernel: [] 

Re: NULL pointer deref in k.o/for-4.5

2016-01-06 Thread Or Gerlitz
On Wed, Jan 6, 2016 at 9:20 PM, Chuck Lever  wrote:
> And appears to be 100% reproducible. Any debugging
> advice welcome!

was reported here 2-3 times, this fixes that
https://patchwork.kernel.org/patch/7929551
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NULL pointer deref in k.o/for-4.5

2016-01-06 Thread Or Gerlitz
On Thu, Jan 7, 2016 at 1:31 AM, Chuck Lever  wrote:
>> On Jan 6, 2016, at 5:25 PM, Or Gerlitz  wrote:
>> On Wed, Jan 6, 2016 at 9:20 PM, Chuck Lever 

>> was reported here 2-3 times, this fixes that
>> https://patchwork.kernel.org/patch/7929551

> Confirmed, that fixes it. Thanks, I never would have
> guessed that was the fix.

tried linux-rdma mailing list search on people failing with the for-4.5 bits?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html