I've read your mail very quickly, bear that in mind when reading below.
If you want to loose data on vm2, put drbd0 on vm2 into secondary (if
it's still isn't) then on vm2 do drbdadm invalidate resourcenamefordrbd0
or drbdadm outdate resourcenamefordrbd0. Latter will discard just the
metadata while invalidate will discard all data on drbd0 on vm2.
But double check if you really want to discard data on vm2 before doing
that. Also don't put drbd0 on vm1 to secondary if you want to preserve
it's data.
Invalidate remote, won't work when they are disconnected. :-)
Connect drbd after validating, it should sync successfully latter.
Also double check your drbd.conf for option that keeps the data from the
device that became master later. It might not be what you want.
Regards,
M.
Adam Sweet wrote:
Hi
I'm sorry to make my first post here a request for help, but I've
inherited a system using DRBD with Heartbeat, something strange
happened and I don't have the experience with either to work out how
to fix it.
The systems are two Xen instances running Debian Lenny with DRBD
0.7.21 and 2.1.3 installed from the stock Lenny repositories, working
as a mailing list server.
About a week ago, something weird happened, it may have been caused by
a routing issue at the hosting provider which was detected a few days
later. One of the Xen instances, the DRDB secondary, hereafter called
vm2, shut down unexpectedly overnight and the following was logged on
the primary, hereafter known as vm1:
Jan 4 18:39:33 lists1 kernel: drbd0: PingAck did not arrive in time.
Jan 4 18:39:33 lists1 kernel: drbd0: drbd0_asender [16239]: cstate
Connected --> NetworkFailure
Jan 4 18:39:33 lists1 kernel: drbd0: asender terminated
Jan 4 18:39:33 lists1 kernel: drbd0: drbd0_receiver [2577]: cstate
NetworkFailure --> BrokenPipe
Jan 4 18:39:33 lists1 kernel: drbd0: short read expecting header on
sock: r=-512
Jan 4 18:39:33 lists1 kernel: drbd0: worker terminated
Jan 4 18:39:33 lists1 kernel: drbd0: drbd0_receiver [2577]: cstate
BrokenPipe --> Unconnected
Jan 4 18:39:33 lists1 kernel: drbd0: Connection lost.
Jan 4 18:39:33 lists1 kernel: drbd0: drbd0_receiver [2577]: cstate
Unconnected --> WFConnection
Jan 4 18:41:37 lists1 kernel: drbd0: drbd0_receiver [2577]: cstate
WFConnection --> WFReportParams
Jan 4 18:41:37 lists1 kernel: drbd0: Handshake successful: DRBD
Network Protocol version 74
Jan 4 18:41:37 lists1 kernel: drbd0: incompatible states (both Primary!)
Jan 4 18:41:37 lists1 kernel: drbd0: drbd0_receiver [2577]: cstate
WFReportParams --> StandAlone
Jan 4 18:41:37 lists1 kernel: drbd0: error receiving ReportParams, l:
72!
Jan 4 18:41:37 lists1 kernel: drbd0: worker terminated
Jan 4 18:41:37 lists1 kernel: drbd0: asender terminated
Jan 4 18:41:37 lists1 kernel: drbd0: drbd0_receiver [2577]: cstate
StandAlone --> StandAlone
Jan 4 18:41:37 lists1 kernel: drbd0: Connection lost.
Jan 4 18:41:37 lists1 kernel: drbd0: receiver terminated
When we brought vm2 (the secondary) up in the morning it assumed the
primary Heartbeat and DRBD role, even though vm1 also held the primary
role, so I powered vm2 off again. It seems as though Heartbeat on vm1
had died at some point, so we started it again and it warned about
resources already being in use (the heartbeat controlled services,
Postgres, Postfix, DRBD, Sympa, crond and atd were already running,
despite heartbeat dying), though afterwards we could bring vm2 up
without it assuming active heartbeart status and the DRBD primary role:
Currently we are here:
cat /proc/drbd:
vm1:
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by r...@vm136, 2008-08-14 08:57:36
0: cs:StandAlone st:Primary/Unknown ld:Consistent
ns:0 nr:0 dw:564848848 dr:244366434 al:640787 bm:72145 lo:0 pe:0
ua:0 ap:0
vm2:
version: 0.7.21 (api:79/proto:74)
SVN Revision: 2326 build by r...@vm137, 2008-08-14 08:57:36
0: cs:WFConnection st:Secondary/Unknown ld:Consistent
ns:0 nr:0 dw:0 dr:0 al:0 bm:127 lo:0 pe:0 ua:0 ap:0
vm1 is currently running the heartbeat controlled services, has the
shared IP address and has the DRBD volume mounted and in use.
When we tell vm1 to connect, we see this:
Jan 12 14:56:45 lists1 kernel: drbd0: drbdsetup [26191]: cstate
StandAlone --> Unconnected
Jan 12 14:56:45 lists1 kernel: drbd0: drbd0_receiver [26192]: cstate
Unconnected --> WFConnection
Jan 12 14:56:47 lists1 kernel: drbd0: drbd0_receiver [26192]: cstate
WFConnection --> WFReportParams
Jan 12 14:56:47 lists1 kernel: drbd0: Handshake successful: DRBD
Network Protocol version 74
Jan 12 14:56:47 lists1 kernel: drbd0: Connection established.
Jan 12 14:56:47 lists1 kernel: drbd0: I am(P):
1:00000002:00000003:0000004f:00000008:10
Jan 12 14:56:47 lists1 kernel: drbd0: Peer(S):
1:00000002:00000004:0000004a:00000009:10
Jan 12 14:56:47 lists1 kernel: drbd0: Current Primary shall become
sync TARGET! Aborting to prevent data corruption.
Jan 12 14:56:47 lists1 kernel: drbd0: drbd0_receiver [26192]: cstate
WFReportParams --> StandAlone
Jan 12 14:56:47 lists1 kernel: drbd0: error receiving ReportParams, l:
72!
Jan 12 14:56:47 lists1 kernel: drbd0: worker terminated
Jan 12 14:56:47 lists1 kernel: drbd0: asender terminated
Jan 12 14:56:47 lists1 kernel: drbd0: drbd0_receiver [26192]: cstate
StandAlone --> StandAlone
Jan 12 14:56:47 lists1 kernel: drbd0: Connection lost.
Jan 12 14:56:47 lists1 kernel: drbd0: receiver terminated
When we tell vm1 to connect and invalidate the remote system, it tells
us that it can only be done when it's connected, but as above, we
can't get it to connect.
I've looked at the documentation and googled some existing mailing
list posts, such as this one:
http://archives.free.net.ph/message/20060619.131041.fd07cb48.en.html
but as this is a busy live system and the customer keeps a close eye
on it, I'm reluctant to try anything which might lead to some lengthy
downtime for a restore and a list of explanations and apologies to the
customer. I'd prefer to ask for your opinion rather than take a guess
at a fix.
Can anybody help? If you need more info, please ask and I will be
happy to provide.
Regards,
Adam Sweet
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user