[Linux-HA] Heartbeat and drbd

Pierguido Wed, 05 Mar 2008 03:14:58 -0800

Hi all.
I'm trying to doing a simple cluster fo two nodes.
Every node is master for few services and in case of failure those
services will be transfered on the slave node.
On node1 will run a mail and db server, on the node2 will run dns and web.
For this i made four drbd block devices to serv each service.
I'm using heartbeat 2.1.3 with crm(v2) to manage the services and drbd
8.2.4.
Till yesterday no poblems...the servers were running smootly without
problems.
Then i tried to tune a bit drbd and here came the problems.
Everytime i changed something in drbd, i was cleaaning the crm config,
shut down the heartbeat, make sure all the services are stopped.
After every modify, i was trying running alone drbd and ahd always no
problems.
Then when i start heartbeat (that should manage the drbd devices), i see
many errors in crm_mon, so then i check drbd and i get this:



server1:

srv-clu-1:~# cat /proc/drbd
version: 8.2.4 (api:88/proto:86-88)
GIT-hash: fc00c6e00a1b6039bfcebe37afa3e7e28dbd92fa build by [EMAIL PROTECTED],
2008-01-11 13:40:26
 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:5353476 nr:0 dw:4 dr:5353713 al:0 bm:674 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:334255 misses:337 starving:0 dirty:0 changed:337
        act_log: used:0/1801 hits:1 misses:0 starving:0 dirty:0 changed:0

 3: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown C r---
    ns:0 nr:0 dw:2312 dr:4041 al:1 bm:77 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/1801 hits:577 misses:1 starving:0 dirty:0 changed:1


server2:

srv-clu-2:~# cat /proc/drbd
version: 8.2.4 (api:88/proto:86-88)
GIT-hash: fc00c6e00a1b6039bfcebe37afa3e7e28dbd92fa build by [EMAIL PROTECTED],
2008-01-11 13:40:26
 0: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
    ns:0 nr:5353476 dw:5353476 dr:0 al:0 bm:337 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:334255 misses:337 starving:0 dirty:0 changed:337
        act_log: used:0/1801 hits:0 misses:0 starving:0 dirty:0 changed:0
 1: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown C r---
    ns:0 nr:0 dw:4 dr:65 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/1801 hits:1 misses:0 starving:0 dirty:0 changed:0
 2: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown C r---
    ns:0 nr:0 dw:344 dr:109 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/1801 hits:86 misses:0 starving:0 dirty:0 changed:0
 3: cs:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown   r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/1801 hits:0 misses:0 starving:0 dirty:0 changed:0

I cleaned again everything....resync all the drbd devices (i nevermounted a filesystem on them) restarted all the servers and startedagain heartbeat. Everything seems ok, but when i go to check drbd i found:


 1: cs:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown   r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/1801 hits:0 misses:0 starving:0 dirty:0 changed:0

It's not possible...i forced the sync on all to make them uptodate.
Why this error?
Anybody had a similar experience?
Thanks all.

Pier
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Heartbeat and drbd

Reply via email to