Hi Digimer
It is not related with the size of the drbd. It is related to TRIM commands:
Bug occures in this setup:
- source-drbd on thin provisioned LVM
- target-drbd on "classic" LVM
- virtual machine on source-drbd issues a TRIM command (e.g. fstrim)
Bug does not show up in this setup:
- source-drbd on thin provisioned LVM
- target-drbd on thin provisioned LVM
Bug does not show up in this setup:
- source-drbd on thin provisioned LVM
- target-drbd on "classic" LVM
- virtual machine on source-drbd does no TRIM commands
source-drbd and target-drbd are connected and UpToDate on both ends
(fully synced) at the start of the test.
My current guess is: drbd does assume TRIM is supported on target side
(becaus it is on source side) and fails on error handling.
Cheers,
Patrick
Am 04.08.2015 um 16:18 schrieb Digimer:
I've used it extensively on arrays up to ~40 TB without issue. So I
suspect there is another problem at play.
Do you have a test environment that you can reproduce this problem in?
If so, then I would recommend testing upgrading the userland and the
kernel modules. 8.4.6 is also out and there is a lot of bug fixes from
.3 to .6. I understand wanting to stick with provided packages, which is
why I am asking, as a test, to try the upgrade in a dev environment.
On 04/08/15 10:13 AM, Patrick Feisthammel (Citrin Informatik GmbH) wrote:
Hi Digimer
Version from /proc/drbd is
version: 8.4.3 (api:1/proto:86-101)
The policy is to stay on the packages provided by the platform, if
possible.
Until know it happens only with one 25GB partition. But it gives a bad
feeling if drbd can cause a repeated reboot of the physical server.
Cheers,
Patrick
Am 04.08.2015 um 15:17 schrieb Digimer:
What version of DRBD itself? (cat /proc/drbd). Not sure if it will help,
but 8.9.3 is out, can you try upgrading?
On 04/08/15 03:16 AM, Patrick Feisthammel (Citrin Informatik GmbH) wrote:
Hi
We have repated Oops with drbd.
Kernel 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u2 (2015-07-17)
x86_64 GNU/Linux
drbd-utils is 8.9.2~rc1-2
This happens on different hardware (same software versions). It seams
only to happen with one specific drbd source.
Kernel Oops is on the receiving site.
More OOps can be produced if helpful. Any suggestion to solve the issue?
Aug 4 08:23:15 octa12 kernel: [34843.532236] drbd k126nfs1: Starting
worker thread (from drbdsetup-84 [7570])
Aug 4 08:23:15 octa12 kernel: [34843.534488] block drbd133: disk(
Diskless -> Attaching )
Aug 4 08:23:15 octa12 kernel: [34843.535065] drbd k126nfs1: Method to
ensure write ordering: drain
Aug 4 08:23:15 octa12 kernel: [34843.535069] block drbd133: max BIO
size = 4096
Aug 4 08:23:15 octa12 kernel: [34843.535074] block drbd133:
drbd_bm_resize called with capacity == 52427128
Aug 4 08:23:15 octa12 kernel: [34843.535283] block drbd133: resync
bitmap: bits=6553391 words=102397 pages=200
Aug 4 08:23:15 octa12 kernel: [34843.535285] block drbd133: size = 25
GB (26213564 KB)
Aug 4 08:23:15 octa12 kernel: [34843.535345] block drbd133: Writing the
whole bitmap, size changed
Aug 4 08:23:15 octa12 kernel: [34843.537060] block drbd133: bitmap
WRITE of 200 pages took 0 jiffies
Aug 4 08:23:15 octa12 kernel: [34843.537064] block drbd133: 25 GB
(6553391 bits) marked out-of-sync by on disk bit-map.
Aug 4 08:23:15 octa12 kernel: [34843.539599] block drbd133: bitmap READ
of 200 pages took 1 jiffies
Aug 4 08:23:15 octa12 kernel: [34843.539730] block drbd133: recounting
of set bits took additional 0 jiffies
Aug 4 08:23:15 octa12 kernel: [34843.539732] block drbd133: 25 GB
(6553391 bits) marked out-of-sync by on disk bit-map.
Aug 4 08:23:15 octa12 kernel: [34843.539741] block drbd133: Suspended
AL updates
Aug 4 08:23:15 octa12 kernel: [34843.539744] block drbd133: disk(
Attaching -> Inconsistent )
Aug 4 08:23:15 octa12 kernel: [34843.539746] block drbd133: attached to
UUIDs
0000000000000004:0000000000000000:0000000000000000:0000000000000000
Aug 4 08:23:15 octa12 kernel: [34843.541494] drbd k126nfs1: conn(
StandAlone -> Unconnected )
Aug 4 08:23:15 octa12 kernel: [34843.541509] drbd k126nfs1: Starting
receiver thread (from drbd_w_k126nfs1 [7572])
Aug 4 08:23:15 octa12 kernel: [34843.543448] drbd k126nfs1: receiver
(re)started
Aug 4 08:23:15 octa12 kernel: [34843.543459] drbd k126nfs1: conn(
Unconnected -> WFConnection )
Aug 4 08:23:19 octa12 kernel: [34847.542570] drbd k126nfs1: Handshake
successful: Agreed network protocol version 101
Aug 4 08:23:19 octa12 kernel: [34847.542572] drbd k126nfs1: Agreed to
support TRIM on protocol level
Aug 4 08:23:19 octa12 kernel: [34847.542601] drbd k126nfs1: conn(
WFConnection -> WFReportParams )
Aug 4 08:23:19 octa12 kernel: [34847.542603] drbd k126nfs1: Starting
asender thread (from drbd_r_k126nfs1 [7577])
Aug 4 08:23:19 octa12 kernel: [34847.558766] block drbd133: max BIO
size = 286720
Aug 4 08:23:19 octa12 kernel: [34847.558776] block drbd133:
drbd_sync_handshake:
Aug 4 08:23:19 octa12 kernel: [34847.558779] block drbd133: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:6553391 flags:0
Aug 4 08:23:19 octa12 kernel: [34847.558782] block drbd133: peer
D523A8E0A929C7CF:D9BDE1030AAC1F1B:BFE2851AAA4046D4:BFE1851AAA4046D4
bits:44229 flags:0
Aug 4 08:23:19 octa12 kernel: [34847.558783] block drbd133:
uuid_compare()=-2 by rule 20
Aug 4 08:23:19 octa12 kernel: [34847.558785] block drbd133: Becoming
sync target due to disk states.
Aug 4 08:23:19 octa12 kernel: [34847.558786] block drbd133: Writing the
whole bitmap, full sync required after drbd_sync_handshake.
Aug 4 08:23:19 octa12 kernel: [34847.560780] block drbd133: bitmap
WRITE of 200 pages took 0 jiffies
Aug 4 08:23:19 octa12 kernel: [34847.560784] block drbd133: 25 GB
(6553391 bits) marked out-of-sync by on disk bit-map.
Aug 4 08:23:19 octa12 kernel: [34847.560840] block drbd133: peer(
Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown
-> UpToDate )
Aug 4 08:23:19 octa12 kernel: [34847.560844] block drbd133: Resumed AL
updates
Aug 4 08:23:19 octa12 kernel: [34847.588181] block drbd133: receive
bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
compression: 100.0%
Aug 4 08:23:19 octa12 kernel: [34847.588290] block drbd133: send bitmap
stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression:
100.0%
Aug 4 08:23:19 octa12 kernel: [34847.588294] block drbd133: conn(
WFBitMapT -> WFSyncUUID )
Aug 4 08:23:19 octa12 kernel: [34847.591990] block drbd133: updated
sync uuid
D9BEE1030AAC1F1A:0000000000000000:0000000000000000:0000000000000000
Aug 4 08:23:19 octa12 kernel: [34847.592078] block drbd133: helper
command: /sbin/drbdadm before-resync-target minor-133
Aug 4 08:23:19 octa12 kernel: [34847.596515] block drbd133: helper
command: /sbin/drbdadm before-resync-target minor-133 exit code 0 (0x0)
Aug 4 08:23:19 octa12 kernel: [34847.596528] block drbd133: conn(
WFSyncUUID -> SyncTarget )
Aug 4 08:23:19 octa12 kernel: [34847.596537] block drbd133: Began
resync as SyncTarget (will sync 26213564 KB [6553391 bits set]).
Aug 4 08:23:23 octa12 kernel: [34849.338284] PGD 1814067 PUD 281c11067
PMD 281a41067 PTE 8010000079ee3067
Aug 4 08:23:23 octa12 kernel: [34849.338312] Oops: 0011 [#1] SMP
Aug 4 08:23:23 octa12 kernel: [34849.338328] Modules linked in:
xt_comment xt_tcpudp ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
nf_nat_ipv6 ip6table_filter ip6_tables nf_nat_ftp xt_REDIRECT
xt_conntrack iptable_mangle nf_conntrack_ftp ipt_REJECT xt_LOG xt_limit
iptable_filter xt_multiport iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables xen_gntdev xen_evtchn
xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd
fscache sunrpc bridge joydev hid_generic iTCO_wdt iTCO_vendor_support
evdev x86_pkg_temp_thermal intel_powerclamp coretemp crc32_pclmul
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper
ablk_helper cryptd psmouse serio_raw pcspkr sb_edac edac_core usbhid hid
i2c_i801 ttm drm_kms_helper drm mei_me mei lpc_ich mfd_core ioatdma
shpchp tpm_tis wmi tpm ipmi_si ipmi_msghandler processor thermal_sys
button 8021q garp stp mrp llc drbd lru_cache libcrc32c autofs4 ext4
crc16 mbcache jbd2 btrfs xor raid6_pq dm_mod sg sd_mod crc_t10dif
crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel isci
ahci libahci libsas ehci_pci ehci_hcd megaraid_sas usbcore usb_common
libata scsi_transport_sas scsi_mod igb i2c_algo_bit i2c_core dca ptp
pps_core
Aug 4 08:23:23 octa12 kernel: [34849.338919] CPU: 1 PID: 7584 Comm:
drbd_a_k126nfs1 Not tainted 3.16.0-4-amd64 #1 Debian
3.16.7-ckt11-1+deb8u2
Aug 4 08:23:23 octa12 kernel: [34849.338970] Hardware name:
Thomas-Krenn.AG X9DRW-3LN4F+/X9DRW-3TF+/X9DRW-3LN4F+/X9DRW-3TF+, BIOS
3.2 01/15/2015
Aug 4 08:23:23 octa12 kernel: [34849.339021] task: ffff8801ff59f570 ti:
ffff880079ee0000 task.ti: ffff880079ee0000
Aug 4 08:23:23 octa12 kernel: [34849.339067] RIP:
e030:[<ffff880079ee3d88>] [<ffff880079ee3d88>] 0xffff880079ee3d88
Aug 4 08:23:23 octa12 kernel: [34849.339116] RSP:
e02b:ffff880079ee3d90 EFLAGS: 00010212
Aug 4 08:23:23 octa12 kernel: [34849.339144] RAX: 00000000fffffffc RBX:
ffffffffffffffff RCX: 0000000000001ab3
Aug 4 08:23:23 octa12 kernel: [34849.339174] RDX: 0000000000001ab3 RSI:
00000000fffffe01 RDI: ffffffff81463f75
Aug 4 08:23:23 octa12 kernel: [34849.339205] RBP: ffff8801ff59f570 R08:
ffff880079ee0000 R09: 0000000000000000
Aug 4 08:23:23 octa12 kernel: [34849.339236] R10: ffff8802001f4810 R11:
0000000000000000 R12: 0000000000000001
Aug 4 08:23:23 octa12 kernel: [34849.339267] R13: 0000000000000000 R14:
0000000000000010 R15: ffff8801ff729800
Aug 4 08:23:23 octa12 kernel: [34849.339302] FS: 0000000000000000(0000)
GS:ffff880274640000(0000) knlGS:0000000000000000
Aug 4 08:23:23 octa12 kernel: [34849.339349] CS: e033 DS: 0000 ES:
0000 CR0: 0000000080050033
Aug 4 08:23:23 octa12 kernel: [34849.339378] CR2: ffff880079ee3d88 CR3:
00000001fe428000 CR4: 0000000000042660
Aug 4 08:23:23 octa12 kernel: [34849.339409] Stack:
Aug 4 08:23:23 octa12 kernel: [34849.339431] ffff880079ee3d88
0000000000000010 0000000000000000 0000000000000000
Aug 4 08:23:23 octa12 kernel: [34849.339487] ffff880079ee3d90
0000000000000001 0000000000000000 0000000000000000
Aug 4 08:23:23 octa12 kernel: [34849.339543] 0000000000004100
ffffffffa039a7be ffff8801ff729880 0000001000000000
Aug 4 08:23:23 octa12 kernel: [34849.339600] Call Trace:
Aug 4 08:23:23 octa12 kernel: [34849.339632] [<ffffffffa039a7be>] ?
drbd_asender+0x27e/0x750 [drbd]
Aug 4 08:23:23 octa12 kernel: [34849.339667] [<ffffffffa03a3d00>] ?
drbd_destroy_connection+0xc0/0xc0 [drbd]
Aug 4 08:23:23 octa12 kernel: [34849.339703] [<ffffffffa03a3d46>] ?
drbd_thread_setup+0x46/0x130 [drbd]
Aug 4 08:23:23 octa12 kernel: [34849.339737] [<ffffffffa03a3d00>] ?
drbd_destroy_connection+0xc0/0xc0 [drbd]
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user