On Wed, May 23, 2012 at 10:34 PM, Zev Weiss <[email protected]> wrote:
>
> On May 23, 2012, at 3:22 PM, Florian Haas wrote:
>
>> On Wed, May 23, 2012 at 10:14 PM, Zev Weiss <[email protected]> wrote:
>>> Hi,
>>>
>>> I'm running DRBD 8.3.12, and recently hit what looks to me like a bug that 
>>> was listed as fixed in 8.3.13 -- getting into a state where both nodes are 
>>> in SyncSource (it's just stuck like that, going nowhere).  Luckily this 
>>> happened on a test resource and not a live one, so it's not a big problem, 
>>> but I was wondering if there were any known ways of recovering it without 
>>> doing anything disruptive to the other resources (e.g. rebooting or 
>>> unloading the kernel module).
>>>
>>> I've tried 'drbdadm down', but it just hangs -- anyone have any other 
>>> suggestions?  It doesn't really matter to me if it wipes the resource or 
>>> anything, I'd just like to have my test device back in a working state 
>>> without disturbing anything else.
>>
>> Can you post /proc/drbd contents from both nodes here?
>>
>
> Sure -- here's one node:
>
> version: 8.3.12 (api:88/proto:86-96)
> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by zweiss@mydomain, 
> 2012-03-14 19:52:38
>
> <snip other resources>
>  9: cs:SyncSource ro:Secondary/Primary ds:UpToDate/Inconsistent C r-----
>    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:65536
>        [>...................] sync'ed:  5.9% (65536/65536)K
>        finish: 19046:04:53 speed: 0 (0 -- 0) K/sec (stalled)
>          0% sector pos: 0/10698352
>        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
>        act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0
>
>
> And here's the other:
>
> version: 8.3.12 (api:88/proto:86-96)
> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by 
> [email protected], 2012-03-14 19:52:38
>
> <snip other resources>
>  9: cs:SyncSource ro:Secondary/Secondary ds:UpToDate/Inconsistent C r-----
>    ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:65536
>        [>...................] sync'ed:  5.9% (65536/65536)K
>        finish: 18987:55:05 speed: 0 (0 -- 0) K/sec (stalled)
>          0% sector pos: 0/10698352
>        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
>        act_log: used:0/3389 hits:0 misses:0 starving:0 dirty:0 changed:0

Ugh. Can you force the device into the WFConnection state by injecting
a couple of iptables rules blocking the replication port, and then
"down" the resource?

Also, Lars, can you shed a little more light on the bug, and its
8.3.13 fix? I had thought the fix was in commit 305dce2c, but it
apparently fixes c19050f4 (which as per git describe was some thirty
commits after 8.3.12, so it shouldn't affect an 8.3.12 user).

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to