Here's some more info. Dmesg shows some suspicious looking log message, such as:
1) FIXME drbd_s_vm-117-s[2830] op clear, bitmap locked for 'receive bitmap' by drbd_r_vm-117-s[5038] 2) Wrong magic value 0xffff0007 in protocol version 114 3) peer request with dagtag 399201392 not found got_peer_ack [drbd] failed 4) Rejecting concurrent remote state change 2226202936 because of state change 2923939731 Ignoring P_TWOPC_ABORT packet 2226202936. 5) drbd_r_vm-117-s[5038] going to 'detect_finished_resyncs()' but bitmap already locked for 'write from resync_finished' by drbd_w_vm-117-s[2812] md_sync_timer expired! Worker calls drbd_md_sync(). 6) incompatible discard-my-data settings conn( Connecting -> Disconnecting ) error receiving P_PROTOCOL, e: -5 l: 7! Two of the four nodes have DRBD 9.0.15-1 and two have 9.0.16-1. All of them API v 16: == mox-a == version: 9.0.15-1 (api:2/proto:86-114) GIT-hash: c46d27900f471ea0f5ba587592439a9ddde1d08b build by root@mox-a, 2018-10-28 03:08:58 Transports (api:16): tcp (9.0.15-1) == mox-b == version: 9.0.15-1 (api:2/proto:86-114) GIT-hash: c46d27900f471ea0f5ba587592439a9ddde1d08b build by root@mox-b, 2018-10-10 17:50:25 Transports (api:16): tcp (9.0.15-1) == mox-c == version: 9.0.16-1 (api:2/proto:86-114) GIT-hash: ab9777dfeaf9d619acc9a5201bfcae8103e9529c build by root@mox-c, 2018-10-28 05:45:05 Transports (api:16): tcp (9.0.16-1) == mox-d == version: 9.0.16-1 (api:2/proto:86-114) GIT-hash: ab9777dfeaf9d619acc9a5201bfcae8103e9529c build by root@mox-d, 2018-10-29 00:22:23 Transports (api:16): tcp (9.0.16-1) Running Proxmox (5.2-2) as can you'd guess from host names. DRBD resources being managed by LINSTOR. On Thu, 1 Nov 2018 at 17:32, Jarno Elonen <[email protected]> wrote: > Okay, today one of these resources got a sudden, severe filesystem > corruption on the primary. > > On the other hand, the secondaries (that showed 8ZiB out-of-sync) were > still mountable after I disconnected the corrupted primary. No idea how > current data the secondaries had, but drbdtop still showed them as > connected and 8Zib out-of-sync. > > This is getting quite worrisome. Is anyone else experiencing this with > DRBD 9? Is it something really wrong in my setup, or are there perhaps some > known instabilities in DRBD 9.0.15-1? > > -Jarno > > > On Wed, 31 Oct 2018 at 20:46, Jarno Elonen <[email protected]> wrote: > >> I've got several DRBD 9 resource that constantly show *UpToDate* with >> 9223372036854774304 bytes (exactly 8ZiB) of OutOfDate data. >> >> Any idea what might cause this and how to fix it? >> >> Example: >> >> # drbdsetup status --verbose --statistics vm-106-disk-1 >> vm-106-disk-1 node-id:0 role:Primary suspended:no >> write-ordering:flush >> volume:0 minor:1003 disk:UpToDate quorum:yes >> size:16777688 read:215779 written:22369564 al-writes:89 bm-writes:0 >> upper-pending:0 >> lower-pending:0 al-suspended:no blocked:no >> mox-a node-id:1 connection:Connected role:Secondary congested:no >> ap-in-flight:0 >> rs-in-flight:18446744073709549808 >> volume:0 replication:Established peer-disk:UpToDate >> resync-suspended:no >> received:215116 sent:22368903 out-of-sync:9223372036854774304 >> pending:0 unacked:0 >> mox-c node-id:2 connection:Connected role:Secondary congested:no >> ap-in-flight:0 >> rs-in-flight:18446744073709549808 >> volume:0 replication:Established peer-disk:UpToDate >> resync-suspended:no >> received:1188 sent:19884428 out-of-sync:0 pending:0 unacked:0 >> >> Version info: >> version: 9.0.15-1 (api:2/proto:86-114) >> GIT-hash: c46d27900f471ea0f5ba587592439a9ddde1d08b build by root@mox-b, >> 2018-10-10 17:50:25 >> Transports (api:16): tcp (9.0.15-1) >> >> -Jarno >> >>
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
