Hello, We are experiencing problems with our DRBD replication system.
We upgraded from Ubuntu 12 to Ubuntu 14.04 (kernel 4.4.0-31-generic). Our version of DRBD module is : 8.4.5 The userland version is : 8.4.4 We often have disconnection problems between primaries and secondaries node. Here is a log when the problem occurs : Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.085940] BUG: unable to handle kernel paging request at 0000000000001000 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.085984] IP: [<ffffffff813e1d96>] memcpy_orig+0x16/0x110 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086046] PGD 4f48b067 PUD 4f458067 PMD 0 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086075] Oops: 0000 [#1] SMP Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086097] Modules linked in: nfsv3 drbd vmw_vsock_vmci_transport vsock lru_cache libcrc32c nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache isofs coretem p crct10dif_pclmul crc32_pclmul ppdev aesni_intel aes_x86_64 lrw gf128mul vmw_balloon glue_helper ablk_helper cryptd input_leds joydev serio_raw vmwgfx ttm drm_kms_helper 8250_fintek parport_pc drm fb_sys _fops vmw_vmci syscopyarea shpchp sysfillrect i2c_piix4 sysimgblt mac_hid lp parport psmouse mptspi e1000 mptscsih mptbase scsi_transport_spi pata_acpi floppy vmw_pvscsi fjes vmxnet3 [last unloaded: drbd] Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086364] CPU: 0 PID: 11696 Comm: drbd_w_r0 Not tainted 4.4.0-31-generic #50~14.04.1-Ubuntu Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086388] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086418] task: ffff880077e52940 ti: ffff880078490000 task.ti: ffff880078490000 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086439] RIP: 0010:[<ffffffff813e1d96>] [<ffffffff813e1d96>] memcpy_orig+0x16/0x110 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086477] RSP: 0018:ffff880078493ae0 EFLAGS: 00010202 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086500] RAX: ffff880077f523ec RBX: 00000000000002d4 RCX: 00000000000002bc Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086521] RDX: 0000000000000294 RSI: 0000000000001000 RDI: ffff880077f523ec Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086541] RBP: ffff880078493b20 R08: ffff880078493c30 R09: 352e31303a31303a Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086564] R10: 302b3334352e3130 R11: 6e6f635b20303032 R12: 0000000000000590 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086594] R13: ffff880078493c50 R14: ffff880078493c60 R15: 0000000000000000 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.087575] FS: 0000000000000000(0000) GS:ffff88007c600000(0000) knlGS:0000000000000000 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.088273] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.089051] CR2: 0000000000001000 CR3: 000000004f46d000 CR4: 00000000000406f0 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.089899] Stack: Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.090670] ffffffff813e65b7 ffff880078493c30 ffff880077f526c0 ffff880078675a00 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.091474] ffff880077ecbc00 000000000000faf0 ffff880078493c40 0000000000000590 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.092233] ffff880078493bb0 ffffffff8173f72d ffffffff8137c2b8 ffff880077e53218 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.092980] Call Trace: Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.093702] [<ffffffff813e65b7>] ? copy_from_iter+0x2b7/0x2d0 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.094415] [<ffffffff8173f72d>] tcp_sendmsg+0x61d/0xad0 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.095030] [<ffffffff8137c2b8>] ? aa_sk_perm+0x78/0x230 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.095747] [<ffffffff81769937>] inet_sendmsg+0x67/0xa0 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.096335] [<ffffffff816d6d58>] sock_sendmsg+0x38/0x50 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.096982] [<ffffffff816d6e6b>] kernel_sendmsg+0x2b/0x30 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.097549] [<ffffffffc0438c40>] drbd_send+0xc0/0x1b0 [drbd] Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.098159] [<ffffffffc043a641>] _drbd_no_send_page.isra.38+0x71/0xb0 [drbd] Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.098704] [<ffffffffc043ab6c>] drbd_send_dblock+0x33c/0x630 [drbd] Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.099232] [<ffffffff810bd784>] ? __wake_up+0x44/0x50 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.099756] [<ffffffffc0420bf4>] w_send_dblock+0x94/0x150 [drbd] Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.100284] [<ffffffffc0421eca>] drbd_worker+0xea/0x350 [drbd] Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.100996] [<ffffffffc0437210>] ? drbd_destroy_connection+0x160/0x160 [drbd] Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.101562] [<ffffffffc043725b>] drbd_thread_setup+0x4b/0x130 [drbd] Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.102120] [<ffffffffc0437210>] ? drbd_destroy_connection+0x160/0x160 [drbd] Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.102612] [<ffffffff8109b849>] kthread+0xc9/0xe0 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.103108] [<ffffffff8109b780>] ? kthread_park+0x60/0x60 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.103573] [<ffffffff817f72cf>] ret_from_fork+0x3f/0x70 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.104023] [<ffffffff8109b780>] ? kthread_park+0x60/0x60 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.104461] Code: 0f 1f 44 00 00 48 89 f8 48 89 d1 f3 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 <4c> 8b 06 4c 8: b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 4c 89 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.105828] RIP [<ffffffff813e1d96>] memcpy_orig+0x16/0x110 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.106322] RSP <ffff880078493ae0> Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.106766] CR2: 0000000000001000 Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.108163] ---[ end trace 4291e6854eec7dff ]--- Apr 28 05:31:46 XXXXXXXXXXXX kernel: [55587.826993] block drbd1: Remote failed to finish a request within ko-count * timeout Apr 28 05:31:46 XXXXXXXXXXXX kernel: [55587.828030] drbd r0: peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown ) Apr 28 05:40:12 XXXXXXXXXXXX rsyncd[54980]: name lookup failed for XXX.XX.XXX.XXX: Temporary failure in name resolution Apr 28 05:40:12 XXXXXXXXXXXX rsyncd[54980]: connect from UNKNOWN (XXX.XX.XXX.XXX) Apr 28 05:40:12 XXXXXXXXXXXX rsyncd[54980]: rsync to log/XXXXXXXXXXXX from UNKNOWN (XXX.XX.XXX.XXX) Apr 28 05:40:12 XXXXXXXXXXXX rsyncd[54980]: receiving file list Have you ever seen this kind of problem ? Do we have to downgrade the kernel version or upgrade the drbd version ? Thanks in advance for your help. Regards Amos Post-scriptum La Poste Ce message est confidentiel. Sous reserve de tout accord conclu par ecrit entre vous et La Poste, son contenu ne represente en aucun cas un engagement de la part de La Poste. Toute publication, utilisation ou diffusion, meme partielle, doit etre autorisee prealablement. Si vous n'etes pas destinataire de ce message, merci d'en avertir immediatement l'expediteur.
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
