Hello,

I'm trying to use a ZFS backed DRBD storage on Proxmox 5.1
Everything is working fine and integration is great but I'm facing some serious 
stability issues under heavy write load.

Basically under heavy write load a node crashes  especially (but not only) 
during VM live migration.
It seems that it happens only during dual-primary operations (which is 
necessary for VM live migration).
The problem happens on both DBRD 8.4.10 and 9.0.12.

Anybody out there facing similar problems?
Thanks.


This is the crash message:

May  9 12:17:33 pve-LAB2 kernel: [  949.958947] Oops: 0003 [#1] SMP PTI
May  9 12:17:33 pve-LAB2 kernel: [  949.958960] Modules linked in: ip_set 
ip6table_filter ip6_tables iptable_filter softdog bonding nfnetlink_log 
nfnetlink ipmi_ssif 
intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel 
mxm_wmi kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc 
ast ttm aesni_intel aes_x86_64 crypto_simd drm_kms_helper glue_helper snd_pcm 
cryptd snd_timer drm snd intel_cstate soundcore fb_sys_fops syscopyarea 
mei_me sysfillrect pcspkr intel_rapl_perf input_leds joydev sysimgblt lpc_ich 
mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter 
acpi_pad mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core 
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drbd sunrpc lru_cache 
libcrc32c 
ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO)
May  9 12:17:33 pve-LAB2 kernel: [  949.959231]  zcommon(PO) znvpair(PO) spl(O) 
btrfs xor zstd_compress raid6_pq hid_generic usbmouse usbkbd usbhid hid 
igb ixgbe i2c_algo_bit mpt3sas dca ahci raid_class ptp i2c_i801 libahci mdio 
pps_core scsi_transport_sas
May  9 12:17:33 pve-LAB2 kernel: [  949.959305] CPU: 0 PID: 18178 Comm: 
drbd_r_Test1-2 Tainted: P           O     4.15.17-1-pve #1
May  9 12:17:33 pve-LAB2 kernel: [  949.959332] Hardware name: Supermicro 
X10DRH LN4/X10DRH-CLN4, BIOS 2.0 01/30/2016
May  9 12:17:33 pve-LAB2 kernel: [  949.959358] RIP: 0010:avl_insert+0x4b/0xd0 
[zavl]
May  9 12:17:33 pve-LAB2 kernel: [  949.959374] RSP: 0018:ffff9e6a086a3ca0 
EFLAGS: 00010282
May  9 12:17:33 pve-LAB2 kernel: [  949.959392] RAX: 0000000000000000 RBX: 
ffff8cf5ca2bb200 RCX: ffffffffc057afcf
May  9 12:17:33 pve-LAB2 kernel: [  949.959415] RDX: 0000000000000000 RSI: 
ffff8cf5ca2bb208 RDI: ffff8cf5ef7d7160
May  9 12:17:33 pve-LAB2 kernel: [  949.959437] RBP: ffff9e6a086a3cf0 R08: 
ffffffffc057afce R09: ffff8cf5fec07180
May  9 12:17:33 pve-LAB2 kernel: [  949.959461] R10: ffff8cf5ca2bb200 R11: 
0000000000000000 R12: ffff8cf5ef7d7130
May  9 12:17:33 pve-LAB2 kernel: [  949.959483] R13: ffff8cf5792f2b00 R14: 
0000000000000000 R15: 0000000000000000
May  9 12:17:33 pve-LAB2 kernel: [  949.959507] FS:  0000000000000000(0000) 
GS:ffff8cf5ff200000(0000) knlGS:0000000000000000
May  9 12:17:33 pve-LAB2 kernel: [  949.959532] CS:  0010 DS: 0000 ES: 0000 
CR0: 0000000080050033
May  9 12:17:33 pve-LAB2 kernel: [  949.959552] CR2: ffffffffc057afce CR3: 
0000001b7ac0a001 CR4: 00000000003626f0
May  9 12:17:33 pve-LAB2 kernel: [  949.959575] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
May  9 12:17:33 pve-LAB2 kernel: [  949.959598] DR3: 0000000000000000 DR6: 
00000000fffe0ff0 DR7: 0000000000000400
May  9 12:17:33 pve-LAB2 kernel: [  949.959621] Call Trace:
May  9 12:17:33 pve-LAB2 kernel: [  949.959677]  ? zfs_range_lock+0x4bf/0x5c0 
[zfs]
May  9 12:17:33 pve-LAB2 kernel: [  949.959699]  ? spl_kmem_alloc+0xae/0x190 
[spl]
May  9 12:17:33 pve-LAB2 kernel: [  949.959744]  zvol_request+0x16e/0x300 [zfs]
May  9 12:17:33 pve-LAB2 kernel: [  949.959764]  
generic_make_request+0x123/0x2f0
May  9 12:17:33 pve-LAB2 kernel: [  949.959781]  submit_bio+0x73/0x150
May  9 12:17:33 pve-LAB2 kernel: [  949.959794]  ? submit_bio+0x73/0x150
May  9 12:17:33 pve-LAB2 kernel: [  949.959812]  ? receive_Barrier+0x147/0x3c0 
[drbd]
May  9 12:17:33 pve-LAB2 kernel: [  949.959832]  receive_Barrier+0x1d6/0x3c0 
[drbd]
May  9 12:17:33 pve-LAB2 kernel: [  949.960673]  ? 
drbd_bump_write_ordering+0x350/0x350 [drbd]
May  9 12:17:33 pve-LAB2 kernel: [  949.961526]  drbd_receiver+0x1ad/0x320 
[drbd]
May  9 12:17:33 pve-LAB2 kernel: [  949.962386]  drbd_thread_setup+0x58/0x140 
[drbd]
May  9 12:17:33 pve-LAB2 kernel: [  949.963127]  kthread+0x105/0x140
May  9 12:17:33 pve-LAB2 kernel: [  949.963859]  ? 
drbd_destroy_device+0x2b0/0x2b0 [drbd]
May  9 12:17:33 pve-LAB2 kernel: [  949.964585]  ? 
kthread_create_worker_on_cpu+0x70/0x70
May  9 12:17:33 pve-LAB2 kernel: [  949.965314]  ? 
kthread_create_worker_on_cpu+0x70/0x70
May  9 12:17:33 pve-LAB2 kernel: [  949.966153]  ret_from_fork+0x35/0x40
May  9 12:17:33 pve-LAB2 kernel: [  949.966874] Code: 89 c1 83 e0 04 48 83 c9 
01 48 09 c8 4d 85 c0 48 c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 48 89 46 10 
0f 84 84 00 00 00 48 63 c2 <49> 89 34 c0 49 8b 50 10 8b 04 85 70 01 46 c0 89 d1 
83 e1 03 83
May  9 12:17:33 pve-LAB2 kernel: [  949.969131] CR2: ffffffffc057afce
May  9 12:17:33 pve-LAB2 kernel: [  949.969890] ---[ end trace 39ccea975700982e 
]---


_______________________________________
Massimo De Nadal

Digital System srl
Via E.B. Mondin 7 - 32100 - Belluno (Italy)
tel. +39.0437.296539 - fax +39.0437.917154
sip:[email protected]
email:[email protected]
http://www.digital-system.it
_______________________________________
/"\
\ /    ASCII Ribbon Campaign
 X   against HTML email & vCards
/ \

Attachment: signature.asc
Description: PGP signature

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to