Le mardi 25 mars 2014 à 09:39 +0100, Olivier Bonvalet a écrit :
> Hi,
> 
> what can/should I do to help fix that problem ?
> 
> for now, RBD kernel client hang on : 
>         Assertion failure in rbd_img_obj_callback() at line 2131:
>            rbd_assert(which >= img_request->next_completion);
> 
> or on :
>         Assertion failure in rbd_img_obj_callback() at line 2127:
>             rbd_assert(img_request != NULL);
> 
> 
> I have both case at least once per week, on latest 3.13.5 kernels.
> 
> It seems that the problem occurs only on more loaded servers (I have 4
> near same servers, and crash occurs on two of them. If I move the VM,
> crash follows...).
> 
> Olivier
> 
> --

Hi,

so. After some days without any problems, RBD crashed toonight :

Apr  5 02:52:24 rurkh kernel: [799426.461742] 
Apr  5 02:52:24 rurkh kernel: [799426.461742] Assertion failure in 
rbd_img_obj_callback() at line 2128:
Apr  5 02:52:24 rurkh kernel: [799426.461742] 
Apr  5 02:52:24 rurkh kernel: [799426.461742]   
rbd_assert(img_request->obj_request_count > 0);
Apr  5 02:52:24 rurkh kernel: [799426.461742] 
Apr  5 02:52:24 rurkh kernel: [799426.461958] ------------[ cut here 
]------------
Apr  5 02:52:24 rurkh kernel: [799426.461997] kernel BUG at 
drivers/block/rbd.c:2128!
Apr  5 02:52:24 rurkh kernel: [799426.462036] invalid opcode: 0000 [#1] SMP 
Apr  5 02:52:24 rurkh kernel: [799426.462080] Modules linked in: cbc rbd 
libceph xen_gntdev xt_physdev iptable_filter ip_tables x_tables xfs
 libcrc32c bridge loop iTCO_wdt gpio_ich iTCO_vendor_support serio_raw sb_edac 
edac_core evdev i2c_i801 lpc_ich mfd_core ioatdma shpchp ipmi
_si ipmi_msghandler wmi ac button dm_mod hid_generic usbhid hid sg sd_mod 
crc_t10dif crct10dif_common isci ahci ehci_pci libsas libahci mega
raid_sas ehci_hcd libata scsi_transport_sas igb usbcore scsi_mod i2c_algo_bit 
ixgbe i2c_core usb_common dca ptp pps_core mdio
Apr  5 02:52:24 rurkh kernel: [799426.462579] CPU: 0 PID: 15975 Comm: 
kworker/0:0 Not tainted 3.13-dae-dom0 #24
Apr  5 02:52:24 rurkh kernel: [799426.462644] Hardware name: Supermicro 
X9DRW-7TPF+/X9DRW-7TPF+, BIOS 3.0 07/24/2013
Apr  5 02:52:24 rurkh kernel: [799426.462717] Workqueue: ceph-msgr con_work 
[libceph]
Apr  5 02:52:24 rurkh kernel: [799426.462759] task: ffff88024cd9a8a0 ti: 
ffff88021a4e4000 task.ti: ffff88021a4e4000
Apr  5 02:52:24 rurkh kernel: [799426.462825] RIP: e030:[<ffffffffa0305ae8>]  
[<ffffffffa0305ae8>] rbd_img_obj_callback+0x91/0x3a2 [rbd]
Apr  5 02:52:24 rurkh kernel: [799426.462901] RSP: e02b:ffff88021a4e5ce8  
EFLAGS: 00010282
Apr  5 02:52:24 rurkh kernel: [799426.462940] RAX: 000000000000006d RBX: 
ffff88023f8f6ec8 RCX: 0000000000000000
Apr  5 02:52:24 rurkh kernel: [799426.463005] RDX: ffff88027fe0eb50 RSI: 
ffff88027fe0e1a8 RDI: ffff88021a4e02a8
Apr  5 02:52:24 rurkh kernel: [799426.463069] RBP: ffff88021c90a718 R08: 
0000000000000000 R09: 0000000000000000
Apr  5 02:52:24 rurkh kernel: [799426.463134] R10: 0000000000000000 R11: 
000000000000084e R12: 0000000000000001
Apr  5 02:52:24 rurkh kernel: [799426.463197] R13: 0000000000000000 R14: 
ffff88025584a130 R15: 0000000000000000
Apr  5 02:52:24 rurkh kernel: [799426.481060] FS:  00007f1c6138f720(0000) 
GS:ffff88027fe00000(0000) knlGS:0000000000000000
Apr  5 02:52:24 rurkh kernel: [799426.481130] CS:  e033 DS: 0000 ES: 0000 CR0: 
0000000080050033
Apr  5 02:52:24 rurkh kernel: [799426.481170] CR2: 00007f1c6139f000 CR3: 
000000023825c000 CR4: 0000000000042660
Apr  5 02:52:24 rurkh kernel: [799426.481235] Stack:
Apr  5 02:52:24 rurkh kernel: [799426.481266]  000000000000000d 
ffff880254da107d ffffffffffffffff ffff880254da1048
Apr  5 02:52:24 rurkh kernel: [799426.481349]  ffff88025584a128 
ffff88026dc59718 0000000000000000 ffff88025584a130
Apr  5 02:52:24 rurkh kernel: [799426.481429]  0000000000000000 
ffffffffa02e4595 0000000000000015 ffff88026dc59770
Apr  5 02:52:24 rurkh kernel: [799426.481510] Call Trace:
Apr  5 02:52:24 rurkh kernel: [799426.481554]  [<ffffffffa02e4595>] ? 
dispatch+0x3e4/0x55e [libceph]
Apr  5 02:52:24 rurkh kernel: [799426.481600]  [<ffffffffa02df0fc>] ? 
con_work+0xf6e/0x1a65 [libceph]
Apr  5 02:52:24 rurkh kernel: [799426.481646]  [<ffffffff81051f83>] ? 
mmdrop+0xd/0x1c
Apr  5 02:52:24 rurkh kernel: [799426.481687]  [<ffffffff8105265e>] ? 
finish_task_switch+0x4d/0x83
Apr  5 02:52:24 rurkh kernel: [799426.481732]  [<ffffffff810484d7>] ? 
process_one_work+0x15a/0x214
Apr  5 02:52:24 rurkh kernel: [799426.481775]  [<ffffffff8104895b>] ? 
worker_thread+0x139/0x1de
Apr  5 02:52:24 rurkh kernel: [799426.481817]  [<ffffffff81048822>] ? 
rescuer_thread+0x26e/0x26e
Apr  5 02:52:24 rurkh kernel: [799426.481859]  [<ffffffff8104cff6>] ? 
kthread+0x9e/0xa6
Apr  5 02:52:24 rurkh kernel: [799426.481900]  [<ffffffff8104cf58>] ? 
__kthread_parkme+0x55/0x55
Apr  5 02:52:24 rurkh kernel: [799426.481944]  [<ffffffff8137260c>] ? 
ret_from_fork+0x7c/0xb0
Apr  5 02:52:24 rurkh kernel: [799426.481985]  [<ffffffff8104cf58>] ? 
__kthread_parkme+0x55/0x55
Apr  5 02:52:24 rurkh kernel: [799426.482025] Code: 26 06 e1 0f 0b 8b 45 5c 85 
c0 75 21 48 c7 c1 66 88 30 a0 ba 50 08 00 00 48 c7 c6 50 99 30 a0 48 c7 c7 1f 
81 30 a0 e8 5b 26 06 e1 <0f> 0b 41 83 fc ff 75 23 48 c7 c1 f4 8b 30 a0 ba 51 08 
00 00 31 
Apr  5 02:52:24 rurkh kernel: [799426.482413] RIP  [<ffffffffa0305ae8>] 
rbd_img_obj_callback+0x91/0x3a2 [rbd]
Apr  5 02:52:24 rurkh kernel: [799426.482462]  RSP <ffff88021a4e5ce8>
Apr  5 02:52:24 rurkh kernel: [799426.483907] ---[ end trace 4aea8b8c107c24be 
]---


At this time there was a lot of IO, because of backups in VM.
(but no RBD snapshot create or remove)


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to