Hello!

I found the problem.
Even though i created the partitions underlying the DRBD with exactly the same size as on the old server and the still running primary it seems to have been too small.
Maybe the newer version of drbd uses more space for the meta data?

But what made it difficult to find was that although nearly all log messages went to /var/log/messages, the vital message only went to the syslog:

Aug 24 21:59:14 www1 kernel: [37868732.971832] block drbd1: conn( StandAlone -> Unconnected ) Aug 24 21:59:14 www1 kernel: [37868732.971887] block drbd1: Starting receiver thread (from drbd1_worker [1733]) Aug 24 21:59:14 www1 kernel: [37868732.972202] block drbd1: receiver (re)started Aug 24 21:59:14 www1 kernel: [37868732.972222] block drbd1: conn( Unconnected -> WFConnection ) Aug 24 21:59:15 www1 kernel: [37868733.471248] block drbd1: Handshake successful: Agreed network protocol version 91 Aug 24 21:59:15 www1 kernel: [37868733.471284] block drbd1: conn( WFConnection -> WFReportParams ) Aug 24 21:59:15 www1 kernel: [37868733.471344] block drbd1: Starting asender thread (from drbd1_receiver [22671]) Aug 24 21:59:15 www1 kernel: [37868733.471571] block drbd1: data-integrity-alg: <not-used> Aug 24 21:59:15 www1 kernel: [37868733.471602] block drbd1: The peer's disk size is too small! Aug 24 21:59:15 www1 kernel: [37868733.471623] block drbd1: conn( WFReportParams -> Disconnecting ) Aug 24 21:59:15 www1 kernel: [37868733.471645] block drbd1: error receiving ReportSizes, l: 32! Aug 24 21:59:15 www1 kernel: [37868733.471680] block drbd1: asender terminated Aug 24 21:59:15 www1 kernel: [37868733.471699] block drbd1: Terminating drbd1_asender Aug 24 21:59:15 www1 kernel: [37868733.471901] block drbd1: Connection closed Aug 24 21:59:15 www1 kernel: [37868733.471926] block drbd1: conn( Disconnecting -> StandAlone ) Aug 24 21:59:15 www1 kernel: [37868733.471967] block drbd1: receiver terminated Aug 24 21:59:15 www1 kernel: [37868733.471982] block drbd1: Terminating drbd1_receiver

After resizing the partitions everythings running smoothly.


Thanks a lot for your efforts!

Koschi

Am 25.08.2014 16:33, schrieb Christian Koschmieder:
Hello Roland,

Sorry, I didn't attach it because it does not seem to have any relevant information in it. But of course, here it is:

Aug 24 21:59:14 www1 kernel: [37868732.971832] block drbd1: conn( StandAlone -> Unconnected ). Aug 24 21:59:14 www1 kernel: [37868732.971887] block drbd1: Starting receiver thread (from drbd1_worker [1733]) Aug 24 21:59:14 www1 kernel: [37868732.972202] block drbd1: receiver (re)started Aug 24 21:59:14 www1 kernel: [37868732.972222] block drbd1: conn( Unconnected -> WFConnection ). Aug 24 21:59:15 www1 kernel: [37868733.471248] block drbd1: Handshake successful: Agreed network protocol version 91 Aug 24 21:59:15 www1 kernel: [37868733.471284] block drbd1: conn( WFConnection -> WFReportParams ). Aug 24 21:59:15 www1 kernel: [37868733.471344] block drbd1: Starting asender thread (from drbd1_receiver [22671]) Aug 24 21:59:15 www1 kernel: [37868733.471571] block drbd1: data-integrity-alg: <not-used> Aug 24 21:59:15 www1 kernel: [37868733.471623] block drbd1: conn( WFReportParams -> Disconnecting ). Aug 24 21:59:15 www1 kernel: [37868733.471680] block drbd1: asender terminated Aug 24 21:59:15 www1 kernel: [37868733.471699] block drbd1: Terminating drbd1_asender Aug 24 21:59:15 www1 kernel: [37868733.471901] block drbd1: Connection closed Aug 24 21:59:15 www1 kernel: [37868733.471926] block drbd1: conn( Disconnecting -> StandAlone ). Aug 24 21:59:15 www1 kernel: [37868733.471967] block drbd1: receiver terminated Aug 24 21:59:15 www1 kernel: [37868733.471982] block drbd1: Terminating drbd1_receiver


Kind regards,

Koschi

Am 25.08.2014 um 14:23 schrieb Roland Friedwagner:
Hi,

can you provide the log (from the same connection attempt) from
the other node (primary) also?

regards roland

Am Sonntag 24 August 2014 22:09:44 schrieb Christian Koschmieder:
I have two servers to host a website.
Only one is actively used at a time, the other one acts as hot standby.
All data ist replicated via DRBD from the currentlly active server
(primary) to the backup server (secondary).

I recently had to set up a new secondary, because the original one had
hardware problems.
So i followed the instructions in the documentation
(http://www.drbd.org/users-guide-8.3/s-node-failure.html#s-perm-node-failure).

The status of the primary node:
version: 8.3.7 (api:88/proto:86-91)
srcversion: EE47D8BF18AC166BE219757

   1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----
      ns:0 nr:0 dw:202926340 dr:247194962 al:2452 bm:757 lo:0 pe:0 ua:0
ap:0 ep:1 wo:b oos:215272

The status of the secondary node:
version: 8.3.11 (api:88/proto:86-96)
srcversion: F937DCB2E5D83C6CCE4A6C9

1: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r-----
      ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
oos:242699884

This seems to be all right.
But when issuing a connect on the primary it immediately disconnects again.
The log on the secondary has the following entries:

Aug 24 21:59:15 www2 kernel: [ 3780.076072] block drbd1: Handshake
successful: Agreed network protocol version 91
Aug 24 21:59:15 www2 kernel: [ 3780.076122] block drbd1: conn(
WFConnection -> WFReportParams )
Aug 24 21:59:15 www2 kernel: [ 3780.076180] block drbd1: Starting
asender thread (from drbd1_receiver [2502])
Aug 24 21:59:15 www2 kernel: [ 3780.077178] block drbd1:
data-integrity-alg: <not-used>
Aug 24 21:59:15 www2 kernel: [ 3780.077235] block drbd1:
drbd_sync_handshake:
Aug 24 21:59:15 www2 kernel: [ 3780.077272] block drbd1: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:60674971 flags:0
Aug 24 21:59:15 www2 kernel: [ 3780.077328] block drbd1: peer
E2227E948E7B07CD:4445769C1EF0ADCC:B744D0729CC042CC:5AD0061929ED5B9D
bits:53813 flags:0
Aug 24 21:59:15 www2 kernel: [ 3780.077343] block drbd1: conn(
WFReportParams -> NetworkFailure )
Aug 24 21:59:15 www2 kernel: [ 3780.077349] block drbd1: asender terminated
Aug 24 21:59:15 www2 kernel: [ 3780.077351] block drbd1: Terminating
drbd1_asender
Aug 24 21:59:15 www2 kernel: [ 3780.077509] block drbd1:
uuid_compare()=-2 by rule 20
Aug 24 21:59:15 www2 kernel: [ 3780.077549] block drbd1: Becoming sync
target due to disk states.
Aug 24 21:59:15 www2 kernel: [ 3780.077586] block drbd1: Writing the
whole bitmap, full sync required after drbd_sync_handshake.
Aug 24 21:59:15 www2 kernel: [ 3780.162981] block drbd1: bitmap WRITE of
1852 pages took 10 jiffies
Aug 24 21:59:15 www2 kernel: [ 3780.224437] block drbd1: 231 GB
(60674971 bits) marked out-of-sync by on disk bit-map.
Aug 24 21:59:15 www2 kernel: [ 3780.232894] block drbd1:
drbd_sync_handshake:
Aug 24 21:59:15 www2 kernel: [ 3780.232932] block drbd1: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:60674971 flags:0
Aug 24 21:59:15 www2 kernel: [ 3780.232975] block drbd1: peer
E2227E948E7B07CD:4445769C1EF0ADCC:B744D0729CC042CC:5AD0061929ED5B9D
bits:53813 flags:0
Aug 24 21:59:15 www2 kernel: [ 3780.233017] block drbd1:
uuid_compare()=-2 by rule 20
Aug 24 21:59:15 www2 kernel: [ 3780.233053] block drbd1: Becoming sync
target due to disk states.
Aug 24 21:59:15 www2 kernel: [ 3780.233091] block drbd1: Writing the
whole bitmap, full sync required after drbd_sync_handshake.
Aug 24 21:59:15 www2 kernel: [ 3780.287424] block drbd1: bitmap WRITE of
1852 pages took 10 jiffies
Aug 24 21:59:15 www2 kernel: [ 3780.348835] block drbd1: 231 GB
(60674971 bits) marked out-of-sync by on disk bit-map.
Aug 24 21:59:15 www2 kernel: [ 3780.357295] block drbd1: peer( Unknown
-> Primary ) conn( NetworkFailure -> WFBitMapT ) pdsk( DUnknown ->
UpToDate )
Aug 24 21:59:15 www2 kernel: [ 3780.365646] block drbd1: Connection closed
Aug 24 21:59:15 www2 kernel: [ 3780.365688] block drbd1: peer( Primary
-> Unknown ) conn( WFBitMapT -> Unconnected ) pdsk( UpToDate -> DUnknown ) Aug 24 21:59:15 www2 kernel: [ 3780.365731] block drbd1: receiver terminated
Aug 24 21:59:15 www2 kernel: [ 3780.365771] block drbd1: Restarting
drbd1_receiver
Aug 24 21:59:15 www2 kernel: [ 3780.365808] block drbd1: receiver
(re)started
Aug 24 21:59:15 www2 kernel: [ 3780.365871] block drbd1: conn(
Unconnected -> WFConnection )
Aug 24 21:59:15 www2 kernel: [ 3780.373914] block drbd1: bitmap WRITE of
0 pages took 0 jiffies
Aug 24 21:59:15 www2 kernel: [ 3780.374072] block drbd1: 231 GB
(60674971 bits) marked out-of-sync by on disk bit-map.


As far as i understand it, they do have a connection, agree on a
protocol, notice that secondary needs to be fully synced and then just
drop the connection for no apparent reason.

Can you tell me why this might be or where i can get further information
as for why the conenction is being dropped?


Thanks a lot

Koschi
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to