The above 'Extent XXX beyond end of bitmap!' error is constantly reproduced on
our environment. That's not clear what was exactly the trigger, but that
happened when peacemaker were unable to properly failover to another node due
to DRBD timeout issue following by the server reset.
# drbdadm statussg-master-drbd role:Secondary disk:Diskless peer role:Primary
replication:Established peer-disk:UpToDate
# drbdadm up allextent 19136507 beyond end of bitmap!extent 21495810 beyond end
of bitmap!extent 21785161 beyond end of bitmap!... another 50+ entries similar
to above...../shared/drbdmeta.c:2279:apply_al: ASSERT(bm_pos - bm_on_disk_pos
<= chunk - extents_size) failed.sg-master-drbd: Failure: (102) Local
address(port) already in use.Command 'drbdsetup-84 connect sg-master-drbd
ipv4:172.16.2.10:7801 ipv4:172.16.2.20:7801 --protocol=C --max-buffers=64K
--sndbuf-size=1024K --after-sb-0pri=discard-younger-primary
--after-sb-1pri=discard-secondary --after-sb-2pri=call-pri-lost-after-sb'
terminated with exit code 10
]# drbdadm attach allextent 19136507 beyond end of bitmap!extent 21495810
beyond end of bitmap!extent 21785161 beyond end of
bitmap!...../shared/drbdmeta.c:2279:apply_al: ASSERT(bm_pos - bm_on_disk_pos <=
chunk - extents_size) failed.
Previously we fixed that by recreating DRBD meta data and fully resynchronize
the nodes, which obviously is incorrect way to handle it.
The configuration is pretty much standard, with Internal meta data and defaults
for AL and max-peers.
resource master-drbd { net { protocol C; max-buffers 64K;
sndbuf-size 1024K; after-sb-0pri discard-younger-primary;
after-sb-1pri discard-secondary; after-sb-2pri call-pri-lost-after-sb; }
disk { resync-rate 4000M; disk-barrier no; disk-flushes no;
c-plan-ahead 0; read-balancing 1M-striping; } volume 0 { disk
/dev/drbdpool/data; device /dev/drbd0; meta-disk internal; } on
hcluster01 { address 172.16.2.10:7801; } on hcluster02 { address
172.16.2.20:7801; }}
I'm not able to get 'drbdadm dump-md' with the following error:
# drbdadm dump-md allFound meta data is "unclean", please apply-al firstCommand
'drbdmeta 0 v08 /dev/drbdpool/data internal dump-md' terminated with exit code
255
Backend device 'dm-3' for DRBD is a logical volume 'data' which combines two
Hardware RAID0 arrays (sda, sdb) by volume group 'drbdpool'.
Reported sizes on a Failed node:
# blockdev --reportRO RA SSZ BSZ StartSec Size Devicerw
256 512 4096 0 120009573531648 /dev/sdarw 256 512 4096
0 100007977943040 /dev/sdcrw 256 512 4096 0 220017543086080
/dev/dm-3
# blockdev --getsize /dev/drbd0blockdev: cannot open /dev/drbd0: Wrong medium
type
Reported sizes on a Operational node:
# blockdev --reportRO RA SSZ BSZ StartSec Size Devicerw
256 512 4096 0 120009573531648 /dev/sdarw 256 512 4096
0 100007977943040 /dev/sdcrw 256 512 4096 0 220017543086080
/dev/dm-3rw 256 512 4096 0 220010828644352 /dev/drbd0
# blockdev --getsize /dev/drbd0429708649696
# vgdisplay
--- Volume group --- VG Name drbdpool System ID
Format lvm2 Metadata Areas 2 Metadata Sequence No 2
VG Access read/write VG Status resizable MAX LV
0 Cur LV 1 Open LV 1 Max PV
0 Cur PV 2 Act PV 2 VG Size
200.10 TiB PE Size 4.00 MiB Total PE 52456270
Alloc PE / Size 52456270 / 200.10 TiB Free PE / Size 0 / 0
# lvdisplay --- Logical volume --- LV Path /dev/drbdpool/data
LV Name data VG Name drbdpool LV Write Access
read/write LV Status available # open 2
LV Size 200.10 TiB Current LE 52456270 Segments
2 Allocation inherit Read ahead sectors auto -
currently set to 256 Block device 253:3
# dmesg | grep drbd[ 1.863088] drbd: loading out-of-tree module taints
kernel.[ 1.865879] drbd: module verification failed: signature and/or
required key missing - tainting kernel[ 1.894498] drbd: initialized.
Version: 8.4.11-1 (api:1/proto:86-101)[ 1.894501] drbd: GIT-hash:
66145a308421e9c124ec391a7848ac20203bb03c build by mockbuild@, 2018-04-26
12:10:42[ 1.894502] drbd: registered as block device major 147[ 88.950747]
drbd sg-master-drbd: Starting worker thread (from drbdsetup-84 [3242])[
88.951999] drbd sg-master-drbd: conn( StandAlone -> Unconnected ) [
88.952532] drbd sg-master-drbd: Starting receiver thread (from drbd_w_sg-maste
[3244])[ 88.952592] drbd sg-master-drbd: receiver (re)started[ 88.952656]
drbd sg-master-drbd: conn( Unconnected -> WFConnection ) [ 89.453261] drbd
sg-master-drbd: Handshake successful: Agreed network protocol version 101[
89.453271] drbd sg-master-drbd: Feature flags enabled on protocol level: 0xf
TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.[ 89.453358] drbd sg-master-drbd:
conn( WFConnection -> WFReportParams ) [ 89.453373] drbd sg-master-drbd:
Starting ack_recv thread (from drbd_r_sg-maste [3245])[ 89.469010] block
drbd0: max BIO size = 4096[ 89.469023] block drbd0: size = 200 TB
(214854324848 KB)[ 89.469043] block drbd0: peer( Unknown -> Primary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) [49807.178096] drbd
sg-master-drbd: peer( Primary -> Unknown ) conn( Connected -> Disconnecting )
pdsk( UpToDate -> DUnknown ) [49807.178116] drbd sg-master-drbd: ack_receiver
terminated[49807.178124] drbd sg-master-drbd: Terminating
drbd_a_sg-maste[49807.192386] drbd sg-master-drbd: Connection
closed[49807.192452] drbd sg-master-drbd: conn( Disconnecting -> StandAlone )
[49807.192463] drbd sg-master-drbd: receiver terminated[49807.192470] drbd
sg-master-drbd: Terminating drbd_r_sg-maste[49807.229346] drbd sg-master-drbd:
Terminating drbd_w_sg-maste[49847.525209] drbd sg-master-drbd: Starting worker
thread (from drbdsetup-84 [23082])[49847.525490] drbd sg-master-drbd: conn(
StandAlone -> Unconnected ) [49847.525542] drbd sg-master-drbd: Starting
receiver thread (from drbd_w_sg-maste [23084])[49847.525624] drbd
sg-master-drbd: receiver (re)started[49847.525687] drbd sg-master-drbd: conn(
Unconnected -> WFConnection ) [49848.025725] drbd sg-master-drbd: Handshake
successful: Agreed network protocol version 101[49848.025735] drbd
sg-master-drbd: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC
WRITE_SAME WRITE_ZEROES.[49848.025964] drbd sg-master-drbd: conn( WFConnection
-> WFReportParams ) [49848.025979] drbd sg-master-drbd: Starting ack_recv
thread (from drbd_r_sg-maste [23085])[49848.036394] block drbd0: max BIO size =
4096[49848.036407] block drbd0: size = 200 TB (214854324848 KB)[49848.036427]
block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> Connected )
pdsk( DUnknown -> UpToDate )
//OE
-----Original Message-----
From: Robert Altnoeder <[email protected]>
To: [email protected]
Subject: Re: [DRBD-user] Extent XXX beyond end of bitmap!
Date: Tue, 14 Aug 2018 13:03:40 +0200
The following information would be useful for debugging:- Internal or external
meta data?- Any special activity log configuration, like a striped AL,
differentAL stripe size, etc.?- Any manually configured number of AL extents?-
Value of max-peers- Reported size of the DRBD device in sectors- Reported size
of the backend device for DRBD in sectors- Ideally, a 'drbdadm dump-md' of the
meta data of the affected devices
br,Robert
On 08/14/2018 10:02 AM, Yannis Milios wrote:Does this happen on both nodes?
What’s the status of the backingdevice (lvm) ? Can you post the exact versions
for both kernel moduleand utils? Any clue in the logs?
On Tue, 14 Aug 2018 at 06:57, Oleksiy Evin
<[email protected]<mailto:[email protected]>> wrote:
# drbdadm attach all extent 19136522 beyond end of bitmap! extent
19143798 beyond end of bitmap! extent 19151565 beyond end of bitmap!
../shared/drbdmeta.c:2279:apply_al: ASSERT(bm_pos - bm_on_disk_pos <=
chunk - extents_size) failed.
_______________________________________________drbd-user mailing
[email protected]http://lists.linbit.com/mailman/listinfo/drbd-user
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user