Hello, I'm trying to use a ZFS backed DRBD storage on Proxmox 5.1 Everything is working fine and integration is great but I'm facing some serious stability issues under heavy write load.
Basically under heavy write load a node crashes especially (but not only) during VM live migration. It seems that it happens only during dual-primary operations (which is necessary for VM live migration). The problem happens on both DBRD 8.4.10 and 9.0.12. Anybody out there facing similar problems? Thanks. This is the crash message: May 9 12:17:33 pve-LAB2 kernel: [ 949.958947] Oops: 0003 [#1] SMP PTI May 9 12:17:33 pve-LAB2 kernel: [ 949.958960] Modules linked in: ip_set ip6table_filter ip6_tables iptable_filter softdog bonding nfnetlink_log nfnetlink ipmi_ssif intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel mxm_wmi kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ast ttm aesni_intel aes_x86_64 crypto_simd drm_kms_helper glue_helper snd_pcm cryptd snd_timer drm snd intel_cstate soundcore fb_sys_fops syscopyarea mei_me sysfillrect pcspkr intel_rapl_perf input_leds joydev sysimgblt lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drbd sunrpc lru_cache libcrc32c ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) May 9 12:17:33 pve-LAB2 kernel: [ 949.959231] zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq hid_generic usbmouse usbkbd usbhid hid igb ixgbe i2c_algo_bit mpt3sas dca ahci raid_class ptp i2c_i801 libahci mdio pps_core scsi_transport_sas May 9 12:17:33 pve-LAB2 kernel: [ 949.959305] CPU: 0 PID: 18178 Comm: drbd_r_Test1-2 Tainted: P O 4.15.17-1-pve #1 May 9 12:17:33 pve-LAB2 kernel: [ 949.959332] Hardware name: Supermicro X10DRH LN4/X10DRH-CLN4, BIOS 2.0 01/30/2016 May 9 12:17:33 pve-LAB2 kernel: [ 949.959358] RIP: 0010:avl_insert+0x4b/0xd0 [zavl] May 9 12:17:33 pve-LAB2 kernel: [ 949.959374] RSP: 0018:ffff9e6a086a3ca0 EFLAGS: 00010282 May 9 12:17:33 pve-LAB2 kernel: [ 949.959392] RAX: 0000000000000000 RBX: ffff8cf5ca2bb200 RCX: ffffffffc057afcf May 9 12:17:33 pve-LAB2 kernel: [ 949.959415] RDX: 0000000000000000 RSI: ffff8cf5ca2bb208 RDI: ffff8cf5ef7d7160 May 9 12:17:33 pve-LAB2 kernel: [ 949.959437] RBP: ffff9e6a086a3cf0 R08: ffffffffc057afce R09: ffff8cf5fec07180 May 9 12:17:33 pve-LAB2 kernel: [ 949.959461] R10: ffff8cf5ca2bb200 R11: 0000000000000000 R12: ffff8cf5ef7d7130 May 9 12:17:33 pve-LAB2 kernel: [ 949.959483] R13: ffff8cf5792f2b00 R14: 0000000000000000 R15: 0000000000000000 May 9 12:17:33 pve-LAB2 kernel: [ 949.959507] FS: 0000000000000000(0000) GS:ffff8cf5ff200000(0000) knlGS:0000000000000000 May 9 12:17:33 pve-LAB2 kernel: [ 949.959532] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 9 12:17:33 pve-LAB2 kernel: [ 949.959552] CR2: ffffffffc057afce CR3: 0000001b7ac0a001 CR4: 00000000003626f0 May 9 12:17:33 pve-LAB2 kernel: [ 949.959575] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 9 12:17:33 pve-LAB2 kernel: [ 949.959598] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 May 9 12:17:33 pve-LAB2 kernel: [ 949.959621] Call Trace: May 9 12:17:33 pve-LAB2 kernel: [ 949.959677] ? zfs_range_lock+0x4bf/0x5c0 [zfs] May 9 12:17:33 pve-LAB2 kernel: [ 949.959699] ? spl_kmem_alloc+0xae/0x190 [spl] May 9 12:17:33 pve-LAB2 kernel: [ 949.959744] zvol_request+0x16e/0x300 [zfs] May 9 12:17:33 pve-LAB2 kernel: [ 949.959764] generic_make_request+0x123/0x2f0 May 9 12:17:33 pve-LAB2 kernel: [ 949.959781] submit_bio+0x73/0x150 May 9 12:17:33 pve-LAB2 kernel: [ 949.959794] ? submit_bio+0x73/0x150 May 9 12:17:33 pve-LAB2 kernel: [ 949.959812] ? receive_Barrier+0x147/0x3c0 [drbd] May 9 12:17:33 pve-LAB2 kernel: [ 949.959832] receive_Barrier+0x1d6/0x3c0 [drbd] May 9 12:17:33 pve-LAB2 kernel: [ 949.960673] ? drbd_bump_write_ordering+0x350/0x350 [drbd] May 9 12:17:33 pve-LAB2 kernel: [ 949.961526] drbd_receiver+0x1ad/0x320 [drbd] May 9 12:17:33 pve-LAB2 kernel: [ 949.962386] drbd_thread_setup+0x58/0x140 [drbd] May 9 12:17:33 pve-LAB2 kernel: [ 949.963127] kthread+0x105/0x140 May 9 12:17:33 pve-LAB2 kernel: [ 949.963859] ? drbd_destroy_device+0x2b0/0x2b0 [drbd] May 9 12:17:33 pve-LAB2 kernel: [ 949.964585] ? kthread_create_worker_on_cpu+0x70/0x70 May 9 12:17:33 pve-LAB2 kernel: [ 949.965314] ? kthread_create_worker_on_cpu+0x70/0x70 May 9 12:17:33 pve-LAB2 kernel: [ 949.966153] ret_from_fork+0x35/0x40 May 9 12:17:33 pve-LAB2 kernel: [ 949.966874] Code: 89 c1 83 e0 04 48 83 c9 01 48 09 c8 4d 85 c0 48 c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 48 89 46 10 0f 84 84 00 00 00 48 63 c2 <49> 89 34 c0 49 8b 50 10 8b 04 85 70 01 46 c0 89 d1 83 e1 03 83 May 9 12:17:33 pve-LAB2 kernel: [ 949.969131] CR2: ffffffffc057afce May 9 12:17:33 pve-LAB2 kernel: [ 949.969890] ---[ end trace 39ccea975700982e ]--- _______________________________________ Massimo De Nadal Digital System srl Via E.B. Mondin 7 - 32100 - Belluno (Italy) tel. +39.0437.296539 - fax +39.0437.917154 sip:[email protected] email:[email protected] http://www.digital-system.it _______________________________________ /"\ \ / ASCII Ribbon Campaign X against HTML email & vCards / \
signature.asc
Description: PGP signature
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
