More clues: Just witnessed a resync (after invalidate) to steadily go from 100% out-of-sync to 0% (after several automatic disconnects and reconnects). Immediately after reaching 0%, it went to negative -<very-large-number>% ! After that, drbdtop started showing 8.0ZiB out-of-sync.
Looks like a severe wrap-around bug. -Jarno On Thu, 1 Nov 2018 at 22:30, Jarno Elonen <[email protected]> wrote: > Here's some more info. > Dmesg shows some suspicious looking log message, such as: > > 1) FIXME drbd_s_vm-117-s[2830] op clear, bitmap locked for 'receive > bitmap' by drbd_r_vm-117-s[5038] > > 2) Wrong magic value 0xffff0007 in protocol version 114 > > 3) peer request with dagtag 399201392 not found > got_peer_ack [drbd] failed > > 4) Rejecting concurrent remote state change 2226202936 because of state > change 2923939731 > Ignoring P_TWOPC_ABORT packet 2226202936. > > 5) drbd_r_vm-117-s[5038] going to 'detect_finished_resyncs()' but bitmap > already locked for 'write from resync_finished' by drbd_w_vm-117-s[2812] > md_sync_timer expired! Worker calls drbd_md_sync(). > > 6) incompatible discard-my-data settings > conn( Connecting -> Disconnecting ) > error receiving P_PROTOCOL, e: -5 l: 7! > > Two of the four nodes have DRBD 9.0.15-1 and two have 9.0.16-1. All of > them API v 16: > > == mox-a == > version: 9.0.15-1 (api:2/proto:86-114) > GIT-hash: c46d27900f471ea0f5ba587592439a9ddde1d08b build by root@mox-a, > 2018-10-28 03:08:58 > Transports (api:16): tcp (9.0.15-1) > > == mox-b == > version: 9.0.15-1 (api:2/proto:86-114) > GIT-hash: c46d27900f471ea0f5ba587592439a9ddde1d08b build by root@mox-b, > 2018-10-10 17:50:25 > Transports (api:16): tcp (9.0.15-1) > > == mox-c == > version: 9.0.16-1 (api:2/proto:86-114) > GIT-hash: ab9777dfeaf9d619acc9a5201bfcae8103e9529c build by root@mox-c, > 2018-10-28 05:45:05 > Transports (api:16): tcp (9.0.16-1) > > == mox-d == > version: 9.0.16-1 (api:2/proto:86-114) > GIT-hash: ab9777dfeaf9d619acc9a5201bfcae8103e9529c build by root@mox-d, > 2018-10-29 00:22:23 > Transports (api:16): tcp (9.0.16-1) > > Running Proxmox (5.2-2) as can you'd guess from host names. DRBD resources > being managed by LINSTOR. > > > On Thu, 1 Nov 2018 at 17:32, Jarno Elonen <[email protected]> wrote: > >> Okay, today one of these resources got a sudden, severe filesystem >> corruption on the primary. >> >> On the other hand, the secondaries (that showed 8ZiB out-of-sync) were >> still mountable after I disconnected the corrupted primary. No idea how >> current data the secondaries had, but drbdtop still showed them as >> connected and 8Zib out-of-sync. >> >> This is getting quite worrisome. Is anyone else experiencing this with >> DRBD 9? Is it something really wrong in my setup, or are there perhaps some >> known instabilities in DRBD 9.0.15-1? >> >> -Jarno >> >> >> On Wed, 31 Oct 2018 at 20:46, Jarno Elonen <[email protected]> wrote: >> >>> I've got several DRBD 9 resource that constantly show *UpToDate* with >>> 9223372036854774304 bytes (exactly 8ZiB) of OutOfDate data. >>> >>> Any idea what might cause this and how to fix it? >>> >>> Example: >>> >>> # drbdsetup status --verbose --statistics vm-106-disk-1 >>> vm-106-disk-1 node-id:0 role:Primary suspended:no >>> write-ordering:flush >>> volume:0 minor:1003 disk:UpToDate quorum:yes >>> size:16777688 read:215779 written:22369564 al-writes:89 >>> bm-writes:0 upper-pending:0 >>> lower-pending:0 al-suspended:no blocked:no >>> mox-a node-id:1 connection:Connected role:Secondary congested:no >>> ap-in-flight:0 >>> rs-in-flight:18446744073709549808 >>> volume:0 replication:Established peer-disk:UpToDate >>> resync-suspended:no >>> received:215116 sent:22368903 out-of-sync:9223372036854774304 >>> pending:0 unacked:0 >>> mox-c node-id:2 connection:Connected role:Secondary congested:no >>> ap-in-flight:0 >>> rs-in-flight:18446744073709549808 >>> volume:0 replication:Established peer-disk:UpToDate >>> resync-suspended:no >>> received:1188 sent:19884428 out-of-sync:0 pending:0 unacked:0 >>> >>> Version info: >>> version: 9.0.15-1 (api:2/proto:86-114) >>> GIT-hash: c46d27900f471ea0f5ba587592439a9ddde1d08b build by root@mox-b, >>> 2018-10-10 17:50:25 >>> Transports (api:16): tcp (9.0.15-1) >>> >>> -Jarno >>> >>>
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
