First thing that jumps out at me is that the round-robin bonding is not supported. Only mode=1 (Active/Passive) is. Secondly, you do not have fencing, so when the network error occurred, you got a split brain;

> Jun 14 08:50:13 kvm5 kernel: block drbd0: Split-Brain detected but
> unresolved, dropping connection!

So, switch your bonding to mode=1 and then follow the instructions to resolve a split-brain.

http://www.drbd.org/users-guide-8.3/s-resolve-split-brain.html

Once this is sorted out, configure and use actual fencing (stonith in pacemaker terms).

digimer

On 06/15/2013 04:29 AM, cesar wrote:
Hello everyone

*Please Urgent, my servers are in production*

I am in a serious problem and need help

*My my scenario*
- I have two workstations ASUS P8H77-M PRO with Intel core I7, Proxmox VE
2.3, DRBD 8.3.10, LVM on top of DRBD
- 2 NICs Realtek RTL8111/8168 PCI-E of 1 Gb/s in bond round robin only for
use with DRBD

And after awhile it shows me this:

shell#cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by root@sighted,
2012-10-09 12:47:51
  0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
     ns:237256 nr:307093 dw:307093 dr:690264 al:0 bm:321 lo:0 pe:0 ua:0 ap:0
ep:1 wo:b oos:0
  1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
     ns:0 nr:467984 dw:467984 dr:537932 al:0 bm:13 lo:0 pe:0 ua:0 ap:0 ep:1
wo:b oos:0

*This is my configuration:*

File global_common.conf:
global { usage-count no;
}

common {
         protocol C;

         handlers {
                 pri-on-incon-degr
"/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
                 pri-lost-after-sb
"/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
                 local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
halt -f";
                 split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                 out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
         }

         startup {
         }

         disk { on-io-error detach;
         }

         net { sndbuf-size 0; no-tcp-cork; unplug-watermark 16; max-buffers
8000; max-epoch-size 8000;
                 data-integrity-alg sha1;
         }

         syncer { rate 75M; al-extents 3389; cpu-mask 0; verify-alg "sha1";
         }
}

*File r0.res:*
resource r0 {
   protocol C;
   startup {
     wfc-timeout 15;
     degr-wfc-timeout 60;
     become-primary-on both;
   }
   net {
     allow-two-primaries;
     after-sb-0pri discard-zero-changes;
     after-sb-1pri discard-secondary;
     after-sb-2pri disconnect;
   }
   on kvm5 {
     device /dev/drbd0;
     disk /dev/sda3;
     address 10.2.2.50:7788;
     meta-disk internal;
   }
   on kvm6 {
     device /dev/drbd0;
     disk /dev/sda3;
     address 10.2.2.51:7788;
     meta-disk internal;
   }
}

*File r1.res:*
resource r1 {
   protocol C;
   startup {
     wfc-timeout 15;
     degr-wfc-timeout 60;
     become-primary-on both;
   }
   net {
     allow-two-primaries;
     after-sb-0pri discard-zero-changes;
     after-sb-1pri discard-secondary;
     after-sb-2pri disconnect;
   }
   on kvm5 {
     device /dev/drbd1;
     disk /dev/sdb3;
     address 10.2.2.50:7789;
     meta-disk internal;
   }
   on kvm6 {
     device /dev/drbd1;
     disk /dev/sdb3;
     address 10.2.2.51:7789;
     meta-disk internal;
   }
}

*Note:*
I use on the directive net "data-integrity-alg sha1"; because for me is very
important the data

*This is my logs:*

*Log in Node A:*
Jun 14 08:07:28 kvm5 kernel: dlm: connecting to 4
Jun 14 08:50:12 kvm5 kernel: block drbd0: Digest mismatch, buffer modified
by upper layers during write: 21158352s +4096
Jun 14 08:50:12 kvm5 kernel: block drbd0: sock was reset by peer
Jun 14 08:50:12 kvm5 kernel: block drbd0: peer( Primary -> Unknown ) conn(
Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Jun 14 08:50:12 kvm5 kernel: block drbd0: short read expecting header on
sock: r=-104
Jun 14 08:50:12 kvm5 kernel: block drbd0: meta connection shut down by peer.
Jun 14 08:50:12 kvm5 kernel: block drbd0: new current UUID
76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D
Jun 14 08:50:12 kvm5 kernel: block drbd0: asender terminated
Jun 14 08:50:12 kvm5 kernel: block drbd0: Terminating asender thread
Jun 14 08:50:12 kvm5 kernel: block drbd0: Connection closed
Jun 14 08:50:12 kvm5 kernel: block drbd0: conn( BrokenPipe -> Unconnected )
Jun 14 08:50:12 kvm5 kernel: block drbd0: receiver terminated
Jun 14 08:50:12 kvm5 kernel: block drbd0: Restarting receiver thread
Jun 14 08:50:12 kvm5 kernel: block drbd0: receiver (re)started
Jun 14 08:50:12 kvm5 kernel: block drbd0: conn( Unconnected -> WFConnection
)
Jun 14 08:50:13 kvm5 kernel: block drbd0: Handshake successful: Agreed
network protocol version 96
Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Jun 14 08:50:13 kvm5 kernel: block drbd0: Starting asender thread (from
drbd0_receiver [1847])
Jun 14 08:50:13 kvm5 kernel: block drbd0: data-integrity-alg: sha1
Jun 14 08:50:13 kvm5 kernel: block drbd0: drbd_sync_handshake:
Jun 14 08:50:13 kvm5 kernel: block drbd0: self
76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:99
flags:0
Jun 14 08:50:13 kvm5 kernel: block drbd0: peer
CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:0
flags:0
Jun 14 08:50:13 kvm5 kernel: block drbd0: uuid_compare()=100 by rule 90
Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0
Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0 exit code 0 (0x0)
Jun 14 08:50:13 kvm5 kernel: block drbd0: Split-Brain detected but
unresolved, dropping connection!
Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0
Jun 14 08:50:13 kvm5 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0 exit code 0 (0x0)
Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( WFReportParams ->
Disconnecting )
Jun 14 08:50:13 kvm5 kernel: block drbd0: error receiving ReportState, l: 4!
Jun 14 08:50:13 kvm5 kernel: block drbd0: asender terminated
Jun 14 08:50:13 kvm5 kernel: block drbd0: Terminating asender thread
Jun 14 08:50:13 kvm5 kernel: block drbd0: Connection closed
Jun 14 08:50:13 kvm5 kernel: block drbd0: conn( Disconnecting -> StandAlone
)
Jun 14 08:50:13 kvm5 kernel: block drbd0: receiver terminated
Jun 14 08:50:13 kvm5 kernel: block drbd0: Terminating receiver thread

*Log in node B:*
Jun 14 08:07:28 kvm6 kernel: dlm: Using TCP for communications
Jun 14 08:07:28 kvm6 kernel: dlm: got connection from 3
Jun 14 08:50:12 kvm6 kernel: block drbd0: Digest integrity check FAILED:
21158352s +4096
Jun 14 08:50:12 kvm6 kernel: block drbd0: error receiving Data, l: 4140!
Jun 14 08:50:12 kvm6 kernel: block drbd0: peer( Primary -> Unknown ) conn(
Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown )
Jun 14 08:50:12 kvm6 kernel: block drbd0: new current UUID
CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D
Jun 14 08:50:12 kvm6 kernel: block drbd0: asender terminated
Jun 14 08:50:12 kvm6 kernel: block drbd0: Terminating asender thread
Jun 14 08:50:12 kvm6 kernel: block drbd0: Connection closed
Jun 14 08:50:12 kvm6 kernel: block drbd0: conn( ProtocolError -> Unconnected
)
Jun 14 08:50:12 kvm6 kernel: block drbd0: receiver terminated
Jun 14 08:50:12 kvm6 kernel: block drbd0: Restarting receiver thread
Jun 14 08:50:12 kvm6 kernel: block drbd0: receiver (re)started
Jun 14 08:50:12 kvm6 kernel: block drbd0: conn( Unconnected -> WFConnection
)
Jun 14 08:50:13 kvm6 kernel: block drbd0: Handshake successful: Agreed
network protocol version 96
Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Jun 14 08:50:13 kvm6 kernel: block drbd0: Starting asender thread (from
drbd0_receiver [1857])
Jun 14 08:50:13 kvm6 kernel: block drbd0: data-integrity-alg: sha1
Jun 14 08:50:13 kvm6 kernel: block drbd0: drbd_sync_handshake:
Jun 14 08:50:13 kvm6 kernel: block drbd0: self
CF68F4906E4001C5:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:0
flags:0
Jun 14 08:50:13 kvm6 kernel: block drbd0: peer
76A887AA443E0DBB:15B9E4140BB5F41B:48B8F43E491AA38D:48B7F43E491AA38D bits:99
flags:0
Jun 14 08:50:13 kvm6 kernel: block drbd0: uuid_compare()=100 by rule 90
Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0
Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
initial-split-brain minor-0 exit code 0 (0x0)
Jun 14 08:50:13 kvm6 kernel: block drbd0: Split-Brain detected but
unresolved, dropping connection!
Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0
Jun 14 08:50:13 kvm6 kernel: block drbd0: meta connection shut down by peer.
Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( WFReportParams ->
NetworkFailure )
Jun 14 08:50:13 kvm6 kernel: block drbd0: asender terminated
Jun 14 08:50:13 kvm6 kernel: block drbd0: Terminating asender thread
Jun 14 08:50:13 kvm6 kernel: block drbd0: helper command: /sbin/drbdadm
split-brain minor-0 exit code 0 (0x0)
Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( NetworkFailure ->
Disconnecting )
Jun 14 08:50:13 kvm6 kernel: block drbd0: error receiving ReportState, l: 4!
Jun 14 08:50:13 kvm6 kernel: block drbd0: Connection closed
Jun 14 08:50:13 kvm6 kernel: block drbd0: conn( Disconnecting -> StandAlone
)
Jun 14 08:50:13 kvm6 kernel: block drbd0: receiver terminated
Jun 14 08:50:13 kvm6 kernel: block drbd0: Terminating receiver thread



I will be extremely grateful to anyone who can help me

Best regards
Cesar




--
View this message in context: 
http://drbd.10923.n7.nabble.com/Replication-problems-constants-with-DRBD-8-3-10-tp17896.html
Sent from the DRBD - User mailing list archive at Nabble.com.
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to