I have a small test setup with 2 x diskless linstor-satellite nodes, and 4 x diskful linstor-satellite nodes, one of which is the linstor-controller.

The idea is that the diskless node is the compute node (xen, running the VM's whose data is on linstor resources).

I have 2 x test VM's, one which was (and still is) working OK (it's an older debian linux crossbowold), the other has failed (a Windows 10 VM jspiterivm1) while I was installing (attempting) the xen PV drivers (not sure if that is relevant or not). The other two resources are unused (ns2 and windows-wm).

I have a nothing relevant in the linstor error logs, but the linstor controller node has this in it's kern.log:

Dec 30 10:50:44 castle kernel: [4103630.414725] drbd windows-wm san6.mytest.com.au: sock was shut down by peer Dec 30 10:50:44 castle kernel: [4103630.414752] drbd windows-wm san6.mytest.com.au: conn( Connected -> BrokenPipe ) peer( Secondary -> Unknown ) Dec 30 10:50:44 castle kernel: [4103630.414759] drbd windows-wm/0 drbd1001 san6.mytest.com.au: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) Dec 30 10:50:44 castle kernel: [4103630.414807] drbd windows-wm san6.mytest.com.au: ack_receiver terminated Dec 30 10:50:44 castle kernel: [4103630.414810] drbd windows-wm san6.mytest.com.au: Terminating ack_recv thread Dec 30 10:50:44 castle kernel: [4103630.445961] drbd windows-wm san6.mytest.com.au: Restarting sender thread Dec 30 10:50:44 castle kernel: [4103630.479708] drbd windows-wm san6.mytest.com.au: Connection closed Dec 30 10:50:44 castle kernel: [4103630.479739] drbd windows-wm san6.mytest.com.au: helper command: /sbin/drbdadm disconnected Dec 30 10:50:44 castle kernel: [4103630.486479] drbd windows-wm san6.mytest.com.au: helper command: /sbin/drbdadm disconnected exit code 0 Dec 30 10:50:44 castle kernel: [4103630.486533] drbd windows-wm san6.mytest.com.au: conn( BrokenPipe -> Unconnected ) Dec 30 10:50:44 castle kernel: [4103630.486556] drbd windows-wm san6.mytest.com.au: Restarting receiver thread Dec 30 10:50:44 castle kernel: [4103630.486566] drbd windows-wm san6.mytest.com.au: conn( Unconnected -> Connecting ) Dec 30 10:50:44 castle kernel: [4103631.006727] drbd windows-wm san6.mytest.com.au: Handshake to peer 2 successful: Agreed network protocol version 117 Dec 30 10:50:44 castle kernel: [4103631.006735] drbd windows-wm san6.mytest.com.au: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES. Dec 30 10:50:44 castle kernel: [4103631.006918] drbd windows-wm san6.mytest.com.au: Peer authenticated using 20 bytes HMAC Dec 30 10:50:44 castle kernel: [4103631.006943] drbd windows-wm san6.mytest.com.au: Starting ack_recv thread (from drbd_r_windows- [1164]) Dec 30 10:50:44 castle kernel: [4103631.041925] drbd windows-wm/0 drbd1001 san6.mytest.com.au: drbd_sync_handshake: Dec 30 10:50:44 castle kernel: [4103631.041932] drbd windows-wm/0 drbd1001 san6.mytest.com.au: self CC647323743B5AE0:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:120 Dec 30 10:50:44 castle kernel: [4103631.041937] drbd windows-wm/0 drbd1001 san6.mytest.com.au: peer CC647323743B5AE0:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:120 Dec 30 10:50:44 castle kernel: [4103631.041941] drbd windows-wm/0 drbd1001 san6.mytest.com.au: uuid_compare()=no-sync by rule 38 Dec 30 10:50:44 castle kernel: [4103631.229931] drbd windows-wm: Preparing cluster-wide state change 1880606796 (0->2 499/146) Dec 30 10:50:44 castle kernel: [4103631.230424] drbd windows-wm: State change 1880606796: primary_nodes=0, weak_nodes=0 Dec 30 10:50:44 castle kernel: [4103631.230429] drbd windows-wm: Committing cluster-wide state change 1880606796 (0ms) Dec 30 10:50:44 castle kernel: [4103631.230480] drbd windows-wm san6.mytest.com.au: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) Dec 30 10:50:44 castle kernel: [4103631.230486] drbd windows-wm/0 drbd1001 san6.mytest.com.au: pdsk( DUnknown -> UpToDate ) repl( Off -> Established ) Dec 30 10:58:27 castle kernel: [4104093.577650] drbd jspiteriVM1 xen1.mytest.com.au: peer( Primary -> Secondary ) Dec 30 10:58:27 castle kernel: [4104093.790062] drbd jspiteriVM1/0 drbd1011: bitmap WRITE of 327 pages took 216 ms Dec 30 10:58:39 castle kernel: [4104106.278699] drbd jspiteriVM1 xen1.mytest.com.au: Preparing remote state change 490644362 Dec 30 10:58:39 castle kernel: [4104106.278984] drbd jspiteriVM1 xen1.mytest.com.au: Committing remote state change 490644362 (primary_nodes=10) Dec 30 10:58:39 castle kernel: [4104106.278999] drbd jspiteriVM1 xen1.mytest.com.au: peer( Secondary -> Primary ) Dec 30 10:58:40 castle kernel: [4104106.547178] drbd jspiteriVM1/0 drbd1011 xen1.mytest.com.au: resync-susp( no -> connection dependency ) Dec 30 10:58:40 castle kernel: [4104106.547191] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: repl( PausedSyncT -> SyncTarget ) resync-susp( peer -> no ) Dec 30 10:58:40 castle kernel: [4104106.547198] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Syncer continues. Dec 30 11:04:29 castle kernel: [4104456.362585] drbd jspiteriVM1 xen1.mytest.com.au: peer( Primary -> Secondary ) Dec 30 11:04:30 castle kernel: [4104456.388543] drbd jspiteriVM1/0 drbd1011: bitmap WRITE of 1 pages took 24 ms Dec 30 11:04:30 castle kernel: [4104456.401108] drbd jspiteriVM1/0 drbd1011 san6.mytest.com.au: pdsk( UpToDate -> Outdated ) Dec 30 11:04:30 castle kernel: [4104456.788360] drbd jspiteriVM1/0 drbd1011 san6.mytest.com.au: pdsk( Outdated -> Inconsistent ) Dec 30 11:09:15 castle kernel: [4104742.275721] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=2 Dec 30 11:09:15 castle kernel: [4104742.377977] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=2 Dec 30 11:09:16 castle kernel: [4104742.481920] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=3 Dec 30 11:09:16 castle kernel: [4104742.585933] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=4 Dec 30 11:09:16 castle kernel: [4104742.689909] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:16 castle kernel: [4104742.793898] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:16 castle kernel: [4104742.897895] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:16 castle kernel: [4104743.001927] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:16 castle kernel: [4104743.105909] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:16 castle kernel: [4104743.209908] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:16 castle kernel: [4104743.313927] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:17 castle kernel: [4104743.417897] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:17 castle kernel: [4104743.521909] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:17 castle kernel: [4104743.575764] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:17 castle kernel: [4104743.625902] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:17 castle kernel: [4104743.729908] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:17 castle kernel: [4104743.833894] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:17 castle kernel: [4104743.937890] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5 Dec 30 11:09:17 castle kernel: [4104744.041907] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=5
[this line repeats .... until Jan 2 2:33am, probably when I rebooted it]

Jan  2 02:33:46 castle kernel: [4333012.494110] drbd jspiteriVM1 san5.mytest.com.au: Restarting sender thread Jan  2 02:33:46 castle kernel: [4333012.528437] drbd jspiteriVM1 san5.mytest.com.au: Connection closed Jan  2 02:33:46 castle kernel: [4333012.528447] drbd jspiteriVM1 san5.mytest.com.au: helper command: /sbin/drbdadm disconnected Jan  2 02:33:46 castle kernel: [4333012.530942] drbd jspiteriVM1 san5.mytest.com.au: helper command: /sbin/drbdadm disconnected exit code 0 Jan  2 02:33:46 castle kernel: [4333012.530960] drbd jspiteriVM1 san5.mytest.com.au: conn( BrokenPipe -> Unconnected ) Jan  2 02:33:46 castle kernel: [4333012.530970] drbd jspiteriVM1 san5.mytest.com.au: Restarting receiver thread Jan  2 02:33:46 castle kernel: [4333012.530974] drbd jspiteriVM1 san5.mytest.com.au: conn( Unconnected -> Connecting ) Jan  2 02:33:46 castle kernel: [4333013.054060] drbd jspiteriVM1 san5.mytest.com.au: Handshake to peer 1 successful: Agreed network protocol version 117 Jan  2 02:33:46 castle kernel: [4333013.054067] drbd jspiteriVM1 san5.mytest.com.au: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES. Jan  2 02:33:46 castle kernel: [4333013.054426] drbd jspiteriVM1 san5.mytest.com.au: Peer authenticated using 20 bytes HMAC Jan  2 02:33:46 castle kernel: [4333013.054452] drbd jspiteriVM1 san5.mytest.com.au: Starting ack_recv thread (from drbd_r_jspiteri [1046]) Jan  2 02:33:46 castle kernel: [4333013.085933] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: drbd_sync_handshake: Jan  2 02:33:46 castle kernel: [4333013.085941] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: self 122E90789B3D90E2:122E90789B3D90E3:4D2D1C8F63C38B44:B1B847713A96996E bits:21168661 flags:124 Jan  2 02:33:46 castle kernel: [4333013.085946] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: peer 2B520E804A7D4EAC:0000000000000000:4D2D1C8F63C38B44:B1B847713A96996E bits:21168661 flags:124 Jan  2 02:33:46 castle kernel: [4333013.085952] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: uuid_compare()=target-set-bitmap by rule 60 Jan  2 02:33:46 castle kernel: [4333013.085956] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Setting and writing one bitmap slot, after drbd_sync_handshake Jan  2 02:33:46 castle kernel: [4333013.226948] drbd jspiteriVM1/0 drbd1011: bitmap WRITE of 1078 pages took 88 ms Jan  2 02:33:46 castle kernel: [4333013.278401] drbd jspiteriVM1: Preparing cluster-wide state change 3482568163 (0->1 499/146) Jan  2 02:33:46 castle kernel: [4333013.278980] drbd jspiteriVM1: State change 3482568163: primary_nodes=0, weak_nodes=0 Jan  2 02:33:46 castle kernel: [4333013.278985] drbd jspiteriVM1: Committing cluster-wide state change 3482568163 (0ms) Jan  2 02:33:46 castle kernel: [4333013.279050] drbd jspiteriVM1 san5.mytest.com.au: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) Jan  2 02:33:46 castle kernel: [4333013.279055] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: repl( Off -> WFBitMapT ) Jan  2 02:33:46 castle kernel: [4333013.326494] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% Jan  2 02:33:46 castle kernel: [4333013.337300] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% Jan  2 02:33:46 castle kernel: [4333013.337313] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: helper command: /sbin/drbdadm before-resync-target Jan  2 02:33:46 castle kernel: [4333013.339475] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: helper command: /sbin/drbdadm before-resync-target exit code 0 Jan  2 02:33:46 castle kernel: [4333013.339503] drbd jspiteriVM1/0 drbd1011 xen1.mytest.com.au: resync-susp( no -> connection dependency ) Jan  2 02:33:46 castle kernel: [4333013.339504] drbd jspiteriVM1/0 drbd1011 san7.mytest.com.au: resync-susp( no -> connection dependency ) Jan  2 02:33:46 castle kernel: [4333013.339505] drbd jspiteriVM1/0 drbd1011 san6.mytest.com.au: resync-susp( no -> connection dependency ) Jan  2 02:33:46 castle kernel: [4333013.339507] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: repl( WFBitMapT -> SyncTarget ) Jan  2 02:33:46 castle kernel: [4333013.339552] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Began resync as SyncTarget (will sync 104859732 KB [26214933 bits set]). Jan  2 02:50:55 castle kernel: [4334042.151194] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Retrying drbd_rs_del_all() later. refcnt=2 Jan  2 02:50:55 castle kernel: [4334042.254225] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: Resync done (total 1028 sec; paused 0 sec; 102000 K/sec) Jan  2 02:50:55 castle kernel: [4334042.254230] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: expected n_oos:23691797 to be equal to rs_failed:23727152 Jan  2 02:50:55 castle kernel: [4334042.254232] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au:             23727152 failed blocks Jan  2 02:50:55 castle kernel: [4334042.254245] drbd jspiteriVM1/0 drbd1011 xen1.mytest.com.au: resync-susp( connection dependency -> no ) Jan  2 02:50:55 castle kernel: [4334042.254247] drbd jspiteriVM1/0 drbd1011 san7.mytest.com.au: resync-susp( connection dependency -> no ) Jan  2 02:50:55 castle kernel: [4334042.254249] drbd jspiteriVM1/0 drbd1011 san6.mytest.com.au: resync-susp( connection dependency -> no ) Jan  2 02:50:55 castle kernel: [4334042.254252] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: pdsk( Outdated -> UpToDate ) repl( SyncTarget -> Established ) Jan  2 02:50:55 castle kernel: [4334042.281495] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: helper command: /sbin/drbdadm after-resync-target Jan  2 02:50:55 castle kernel: [4334042.289879] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: helper command: /sbin/drbdadm after-resync-target exit code 0 Jan  2 02:50:55 castle kernel: [4334042.289879] drbd jspiteriVM1/0 drbd1011 san5.mytest.com.au: pdsk( UpToDate -> Inconsistent ) Jan  2 10:23:28 castle kernel: [4361194.855074] drbd windows-wm san7.mytest.com.au: sock was shut down by peer Jan  2 10:23:28 castle kernel: [4361194.855101] drbd windows-wm san7.mytest.com.au: conn( Connected -> BrokenPipe ) peer( Secondary -> Unknown ) Jan  2 10:23:28 castle kernel: [4361194.855109] drbd windows-wm/0 drbd1001 san7.mytest.com.au: pdsk( UpToDate -> DUnknown ) repl( Established -> Off ) Jan  2 10:23:28 castle kernel: [4361194.855161] drbd windows-wm san7.mytest.com.au: ack_receiver terminated Jan  2 10:23:28 castle kernel: [4361194.855164] drbd windows-wm san7.mytest.com.au: Terminating ack_recv thread Jan  2 10:23:28 castle kernel: [4361194.882138] drbd windows-wm san7.mytest.com.au: Restarting sender thread Jan  2 10:23:28 castle kernel: [4361194.961402] drbd windows-wm san7.mytest.com.au: Connection closed Jan  2 10:23:28 castle kernel: [4361194.961435] drbd windows-wm san7.mytest.com.au: helper command: /sbin/drbdadm disconnected Jan  2 10:23:28 castle kernel: [4361194.968763] drbd windows-wm san7.mytest.com.au: helper command: /sbin/drbdadm disconnected exit code 0 Jan  2 10:23:28 castle kernel: [4361194.968800] drbd windows-wm san7.mytest.com.au: conn( BrokenPipe -> Unconnected ) Jan  2 10:23:28 castle kernel: [4361194.968812] drbd windows-wm san7.mytest.com.au: Restarting receiver thread Jan  2 10:23:28 castle kernel: [4361194.968816] drbd windows-wm san7.mytest.com.au: conn( Unconnected -> Connecting ) Jan  2 10:23:29 castle kernel: [4361195.486059] drbd windows-wm san7.mytest.com.au: Handshake to peer 3 successful: Agreed network protocol version 117 Jan  2 10:23:29 castle kernel: [4361195.486066] drbd windows-wm san7.mytest.com.au: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES. Jan  2 10:23:29 castle kernel: [4361195.486490] drbd windows-wm san7.mytest.com.au: Peer authenticated using 20 bytes HMAC Jan  2 10:23:29 castle kernel: [4361195.486515] drbd windows-wm san7.mytest.com.au: Starting ack_recv thread (from drbd_r_windows- [1165]) Jan  2 10:23:29 castle kernel: [4361195.517928] drbd windows-wm/0 drbd1001 san7.mytest.com.au: drbd_sync_handshake: Jan  2 10:23:29 castle kernel: [4361195.517935] drbd windows-wm/0 drbd1001 san7.mytest.com.au: self CC647323743B5AE0:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:120 Jan  2 10:23:29 castle kernel: [4361195.517940] drbd windows-wm/0 drbd1001 san7.mytest.com.au: peer CC647323743B5AE0:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:120 Jan  2 10:23:29 castle kernel: [4361195.517944] drbd windows-wm/0 drbd1001 san7.mytest.com.au: uuid_compare()=no-sync by rule 38 Jan  2 10:23:29 castle kernel: [4361195.677932] drbd windows-wm: Preparing cluster-wide state change 3667329610 (0->3 499/146) Jan  2 10:23:29 castle kernel: [4361195.678459] drbd windows-wm: State change 3667329610: primary_nodes=0, weak_nodes=0 Jan  2 10:23:29 castle kernel: [4361195.678466] drbd windows-wm: Committing cluster-wide state change 3667329610 (0ms) Jan  2 10:23:29 castle kernel: [4361195.678516] drbd windows-wm san7.mytest.com.au: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) Jan  2 10:23:29 castle kernel: [4361195.678522] drbd windows-wm/0 drbd1001 san7.mytest.com.au: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )

castle:/var/log# linstor resource list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node   ┊ Port ┊ Usage  ┊ Conns ┊             State ┊ CreatedOn           ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ crossbowold  ┊ castle ┊ 7010 ┊ Unused ┊ Ok   ┊          UpToDate ┊ 2020-10-07 00:46:23 ┊ ┊ crossbowold  ┊ flail  ┊ 7010 ┊ Unused ┊ Ok ┊          Diskless ┊ 2021-01-04 05:03:20 ┊ ┊ crossbowold  ┊ san5   ┊ 7010 ┊ Unused ┊ Ok ┊          UpToDate ┊ 2020-10-07 00:46:23 ┊ ┊ crossbowold  ┊ san6   ┊ 7010 ┊ Unused ┊ Ok    ┊          UpToDate ┊ 2020-10-07 00:46:22 ┊ ┊ crossbowold  ┊ san7   ┊ 7010 ┊ Unused ┊ Ok ┊          UpToDate ┊ 2020-10-07 00:46:21 ┊ ┊ crossbowold  ┊ xen1   ┊ 7010 ┊ InUse  ┊ Ok ┊          Diskless ┊ 2020-10-15 00:30:31 ┊ ┊ jspiteriVM1  ┊ castle ┊ 7011 ┊ Unused ┊ StandAlone(san6.mytest.com.au,san7.mytest.com.au)    ┊ SyncTarget(0.00%) ┊ 2020-10-14 22:15:00 ┊ ┊ jspiteriVM1  ┊ san5   ┊ 7011 ┊ Unused ┊ Connecting(san7.mytest.com.au)   ┊      Inconsistent ┊ 2020-10-14 22:14:59 ┊ ┊ jspiteriVM1  ┊ san6   ┊ 7011 ┊ Unused ┊ Connecting(castle.mytest.com.au,san7.mytest.com.au) ┊ SyncTarget(0.00%) ┊ 2020-10-14 22:14:58 ┊ ┊ jspiteriVM1  ┊ san7   ┊ 7011 ┊ Unused ┊ Connecting(castle.mytest.com.au),StandAlone(san6.mytest.com.au,san5.mytest.com.au) ┊      Inconsistent ┊ 2020-10-14 22:14:58 ┊ ┊ jspiteriVM1  ┊ xen1   ┊ 7011 ┊ Unused ┊ Ok ┊          Diskless ┊ 2020-11-20 20:39:20 ┊ ┊ ns2          ┊ castle ┊ 7000 ┊ Unused ┊ Ok ┊          UpToDate ┊ 2020-10-28 23:22:13 ┊ ┊ ns2          ┊ flail  ┊ 7000 ┊ Unused ┊ Ok ┊          Diskless ┊ 2021-01-04 05:03:42 ┊ ┊ ns2          ┊ san5   ┊ 7000 ┊ Unused ┊ Ok ┊          UpToDate ┊ 2020-10-28 23:22:12 ┊ ┊ ns2          ┊ san6   ┊ 7000 ┊ Unused ┊ Ok    ┊          UpToDate ┊ 2020-10-28 23:22:11 ┊ ┊ ns2          ┊ xen1   ┊ 7000 ┊ Unused ┊ Ok ┊          Diskless ┊ 2020-10-28 23:30:20 ┊ ┊ windows-wm   ┊ castle ┊ 7001 ┊ Unused ┊ Ok ┊          UpToDate ┊ 2020-09-30 00:03:41 ┊ ┊ windows-wm   ┊ flail  ┊ 7001 ┊ Unused ┊ Ok ┊          Diskless ┊ 2021-01-04 05:03:48 ┊ ┊ windows-wm   ┊ san5   ┊ 7001 ┊ Unused ┊ Ok ┊          UpToDate ┊ 2020-09-30 00:03:40 ┊ ┊ windows-wm   ┊ san6   ┊ 7001 ┊ Unused ┊ Ok ┊          UpToDate ┊ 2020-09-30 00:03:39 ┊ ┊ windows-wm   ┊ san7   ┊ 7001 ┊ Unused ┊ Ok    ┊          UpToDate ┊ 2020-09-30 00:13:05 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Could anyone determine from this, or advise what additional logs I should examine, to work out why this failed? I don't see anything obvious as to what caused linstor/drbd to fail here, all nodes where online and un-interrupted as far as I can tell. All physical storage is backed by MD raid arrays, so again there is some protection against disk failures (haven't noticed any anyway though).

I've since done a upgrade to the latest version of drbd/linstor components on all nodes.

Finally, what could I do to recover the data? Has it been destroyed, or do I just need to select a node and tell lintor that this node has up to date data? Or can linstor work that out somehow?

Regards,
Adam

_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
[email protected]
https://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to