Hi,
We are using DRBD with a RHEL cluster in two nodes + diskless tiebreaker setup:
Corosync, Pacemaker and DRBD(node1, node2, tiebreaker-diskless), our versions
are kmod-drbd-9.0.27 and drbd-utils-9.12.1
We are testing failover scenarios on a physical environment with RedHat 7.6 by
executing hard shutdowns on node1 and node 2 (we always start from a stable
state with all nodes connected and up2date).
>From time to time when node2 is shut down (forced) the volume on node1 moves
>to state Consistent from Up2Date after which (a couple of milliseconds later)
>the tiebreaker reports an error resulting in the following drbdadm status:
node1
postgres-zabbix-data7790 role:Secondary suspended:quorum
disk:UpToDate quorum:no blocked:upper
node2 connection:Connecting
tiebreaker connection:Connecting
tiebreaker
postgres-zabbix-data7790 role:Secondary suspended:quorum
disk:Diskless quorum:no blocked:upper
node1 role:StandAlone
node2 connection:Connecting
resource postgres-zabbix-data7790 {
options {
quorum majority;
}
protocol C;
startup {
wfc-timeout 10;
degr-wfc-timeout 5;
}
net {
max-epoch-size 2048;
max-buffers 2048;
sndbuf-size 0;
rcvbuf-size 0;
}
disk {
on-io-error detach;
c-max-rate 900M;
c-min-rate 100M;
c-fill-target 1M;
resync-rate 300M;
}
on node1 {
device /dev/drbd7790;
disk /dev/mapper/lvmdata-postgres--zabbix--data7790;
node-id 1;
meta-disk internal;
address 10.21.24.11:7790;
}
on node2 {
device /dev/drbd7790;
disk /dev/mapper/lvmdata-postgres--zabbix--data7790;
meta-disk internal;
node-id 2;
address 10.21.24.12:7790;
}
on tiebreaker {
device /dev/drbd7790;
disk none;
meta-disk internal;
node-id 3;
address 10.21.24.13:7790;
}
connection-mesh {
hosts node1 node2 tiebreaker;
}
}
Tiebreaker logs:
2021-02-23T23:32:24.299214+00:00;ERR;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: PingAck did not arrive in time.;
2021-02-23T23:32:24.299244+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker
logic to keep;
2021-02-23T23:32:24.299266+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: conn( Connected -> NetworkFailure ) peer(
Primary -> Unknown );
2021-02-23T23:32:24.299305+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790 node2: pdsk( UpToDate -> DUnknown ) repl(
Established -> Off );
2021-02-23T23:32:24.299326+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: ack_receiver terminated;
2021-02-23T23:32:24.299349+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: Terminating ack_recv thread;
2021-02-23T23:32:24.299372+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker
logic to keep;
2021-02-23T23:32:24.299394+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker
logic to keep;
2021-02-23T23:32:24.318231+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: Restarting sender thread;
2021-02-23T23:32:24.318268+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: Connection closed;
2021-02-23T23:32:24.318290+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker
logic to keep;
2021-02-23T23:32:24.318312+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: conn( NetworkFailure -> Unconnected );
2021-02-23T23:32:24.318337+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: Restarting receiver thread;
2021-02-23T23:32:24.318370+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker
logic to keep;
2021-02-23T23:32:24.318404+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: conn( Unconnected -> Connecting );
2021-02-23T23:32:24.803484+00:00;ERR;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: Got NegDReply; Sector 0s, len 131072.;
2021-02-23T23:32:24.803557+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker
logic to keep;
2021-02-23T23:32:24.803587+00:00;ERR;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790: State change failed: Need access to UpToDate data;
2021-02-23T23:32:24.803628+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790 node1: Failed: pdsk( UpToDate -> Consistent
);
2021-02-23T23:32:24.803652+00:00;ERR;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: drbd_req_destroy: Logic BUG rq_state:
(0:300000, 2:104), completion_ref = 0;
2021-02-23T23:32:24.805509+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790: susp-io( no -> quorum);
2021-02-23T23:32:24.805540+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node1: conn( Connected -> Disconnecting ) peer(
Secondary -> Unknown );
2021-02-23T23:32:24.805570+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: quorum( yes -> no );
2021-02-23T23:32:24.805599+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790 node1: pdsk( UpToDate -> DUnknown ) repl(
Established -> Off );
2021-02-23T23:32:24.805627+00:00;ERR;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node1: error receiving P_STATE, e: -5 l: 0!;
2021-02-23T23:32:24.807322+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node1: ack_receiver terminated;
2021-02-23T23:32:24.807435+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node1: Terminating ack_recv thread;
2021-02-23T23:32:24.818354+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node1: Aborting remote state change 0 commit not
possible;
2021-02-23T23:32:24.818465+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node1: Restarting sender thread;
2021-02-23T23:32:24.818542+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node1: Connection closed;
2021-02-23T23:32:24.818617+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node1: conn( Disconnecting -> StandAlone );
2021-02-23T23:32:24.818690+00:00;INFO;tiebreaker;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node1: Terminating receiver thread;
Node1 logs:
2021-02-23T23:32:24.800547+00:00;ERR;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: PingAck did not arrive in time.;
2021-02-23T23:32:24.800610+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker
logic to keep;
2021-02-23T23:32:24.800645+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: conn( Connected -> NetworkFailure ) peer(
Primary -> Unknown );
2021-02-23T23:32:24.800676+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: disk( UpToDate -> Consistent );
2021-02-23T23:32:24.800706+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790 node2: pdsk( UpToDate -> DUnknown ) repl(
Established -> Off );
2021-02-23T23:32:24.800742+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: ack_receiver terminated;
2021-02-23T23:32:24.800771+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: Terminating ack_recv thread;
2021-02-23T23:32:24.800800+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker
logic to keep;
2021-02-23T23:32:24.800829+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker
logic to keep;
2021-02-23T23:32:24.800856+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790 tiebreaker: receive_peer_dagatg():
source-set-bitmap by rule 30;
2021-02-23T23:32:24.800886+00:00;ERR;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: Can not satisfy peer's read request, no
local data.;
2021-02-23T23:32:24.802664+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: Would lose quorum, but using tiebreaker
logic to keep;
2021-02-23T23:32:24.811534+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: sock was shut down by peer;
2021-02-23T23:32:24.811594+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790: susp-io( no -> quorum);
2021-02-23T23:32:24.811626+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: conn( Connected -> BrokenPipe ) peer(
Secondary -> Unknown );
2021-02-23T23:32:24.811662+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: quorum( yes -> no );
2021-02-23T23:32:24.811692+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790 tiebreaker: pdsk( Diskless -> DUnknown )
repl( Established -> Off );
2021-02-23T23:32:24.811721+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: ack_receiver terminated;
2021-02-23T23:32:24.811750+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: Terminating ack_recv thread;
2021-02-23T23:32:24.811780+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790: Preparing cluster-wide state change 3151158077 (1->-1
0/0);
2021-02-23T23:32:24.811814+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790: Aborting cluster-wide state change 3151158077 (9ms)
rv = -19;
2021-02-23T23:32:24.816641+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: Aborting remote state change 0 commit not
possible;
2021-02-23T23:32:24.816701+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: Restarting sender thread;
2021-02-23T23:32:24.816925+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: Connection closed;
2021-02-23T23:32:24.816962+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: conn( NetworkFailure -> Unconnected );
2021-02-23T23:32:24.816989+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: Restarting receiver thread;
2021-02-23T23:32:24.817018+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 node2: conn( Unconnected -> Connecting );
2021-02-23T23:32:24.826555+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: Aborting remote state change 0 commit not
possible;
2021-02-23T23:32:24.826622+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: Restarting sender thread;
2021-02-23T23:32:24.826651+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: Connection closed;
2021-02-23T23:32:24.826681+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: conn( BrokenPipe -> Unconnected );
2021-02-23T23:32:24.826709+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: Restarting receiver thread;
2021-02-23T23:32:24.826745+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: conn( Unconnected -> Connecting );
2021-02-23T23:32:24.912545+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790: Preparing cluster-wide state change 1764698033 (1->-1
0/0);
2021-02-23T23:32:24.912605+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790: Committing cluster-wide state change 1764698033 (0ms);
2021-02-23T23:32:24.912637+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790/0 drbd7790: disk( Consistent -> UpToDate );
2021-02-23T23:32:34.322531+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: sock was shut down by peer;
2021-02-23T23:32:34.322577+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: conn( Connecting -> BrokenPipe );
2021-02-23T23:32:34.322599+00:00;WARNING;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: short read (expected size 8);
2021-02-23T23:32:34.345524+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: Aborting remote state change 0 commit not
possible;
2021-02-23T23:32:34.345582+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: Restarting sender thread;
2021-02-23T23:32:34.347540+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: Connection closed;
2021-02-23T23:32:34.347607+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: conn( BrokenPipe -> Unconnected );
2021-02-23T23:32:35.347753+00:00;INFO;node1;P-/;[kernel/]; drbd
postgres-zabbix-data7790 tiebreaker: conn( Unconnected -> Connecting );
Do you have any advice?
Best Regards,
Mihai
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
[email protected]
https://lists.linbit.com/mailman/listinfo/drbd-user