Re: [DRBD-user] Recovering from erroneous sync state

Zev Weiss Wed, 23 May 2012 14:07:36 -0700

On May 23, 2012, at 3:47 PM, Florian Haas wrote:

> On Wed, May 23, 2012 at 10:34 PM, Zev Weiss <[email protected]> wrote:
>> 
>> On May 23, 2012, at 3:22 PM, Florian Haas wrote:
>> 
>>> On Wed, May 23, 2012 at 10:14 PM, Zev Weiss <[email protected]> wrote:
>>>> Hi,
>>>> 
>>>> I'm running DRBD 8.3.12, and recently hit what looks to me like a bug that 
>>>> was listed as fixed in 8.3.13 -- getting into a state where both nodes are 
>>>> in SyncSource (it's just stuck like that, going nowhere).  Luckily this 
>>>> happened on a test resource and not a live one, so it's not a big problem, 
>>>> but I was wondering if there were any known ways of recovering it without 
>>>> doing anything disruptive to the other resources (e.g. rebooting or 
>>>> unloading the kernel module).
>>>> 
>>>> I've tried 'drbdadm down', but it just hangs -- anyone have any other 
>>>> suggestions?  It doesn't really matter to me if it wipes the resource or 
>>>> anything, I'd just like to have my test device back in a working state 
>>>> without disturbing anything else.
>>> 
>>> Can you post /proc/drbd contents from both nodes here?
>>> 
>> 
>> Sure -- here's one node:
>> 
>> version: 8.3.12 (api:88/proto:86-96)
>> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by zweiss@mydomain, 
>> 2012-03-14 19:52:38
>> 
>> <snip other resources>
>>  9: cs:SyncSource ro:Secondary/Primary ds:UpToDate/Inconsistent C r-----
>>    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:65536
>>        [>...................] sync'ed:  5.9% (65536/65536)K
>>        finish: 19046:04:53 speed: 0 (0 -- 0) K/sec (stalled)
>>          0% sector pos: 0/10698352
>>        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
>>        act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0
>> 
>> 
>> And here's the other:
>> 
>> version: 8.3.12 (api:88/proto:86-96)
>> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by 
>> [email protected], 2012-03-14 19:52:38
>> 
>> <snip other resources>
>>  9: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r-----
>>    ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:65536
>>        [>...................] sync'ed:  5.9% (65536/65536)K
>>        finish: 18987:55:05 speed: 0 (0 -- 0) K/sec (stalled)
>>          0% sector pos: 0/10698352
>>        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
>>        act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0
> 
> Ugh. Can you force the device into the WFConnection state by injecting
> a couple of iptables rules blocking the replication port, and then
> "down" the resource?


I've now inserted iptables rules on both sides for the relevant replication 
port (reject-with icmp-port-unreachable), but no transition to WFConnection -- 
it's still stuck in the same state (and unsurprisingly, 'down' still just 
hangs).


Zev

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Recovering from erroneous sync state

Reply via email to