Hi,

I'm not sure if this is the right place to ask as it could be a drdb issue.

I have a Centos 5.2 with heartbeat-2.1.4-2.1 and drbd82-8.2.6-1.el5.centos. 
After having tested various scenarios during construction, one being a reboot 
on a primary server, I have found that it failed to successfully failover now 
that it's in production.

The problem was drdb failed to promote to primary
According to the logs after the primary was rebooted
        - slave heartbeat received shutdown notice from peer, then
 drbd1: role( Secondary -> Primary )
 drbd1: Writing meta data super block now.
 drbd1: State change failed: Refusing to be Primary while peer is not outdated
 drbd1:   state = { cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate r--- 
}
 drbd1:  wanted = { cs:TearDown st:Primary/Unknown ds:UpToDate/DUnknown r--- }
 drbd1: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( 
UpToDate -> Outdated )
 drbd1: Writing meta data super block now.
 drbd1: Creating new current UUID
 drbd1: Writing meta data super block now.
 drbd1: asender terminated
 drbd1: Terminating asender thread
 drbd1: tl_clear()
 drbd1: Connection closed
 drbd1: conn( TearDown -> Unconnected )
 drbd1: receiver terminated
 drbd1: receiver (re)started
 drbd1: conn( Unconnected -> WFConnection )

I'm thinking dopd might have had something to do with the failure of the drbd 
resource takeover.
Anyone know what might have happened?

Thanks,
Jai



drbd.conf

global { 
  usage-count yes; 
}
common {
  protocol C;
  startup {
          wfc-timeout 0;
          degr-wfc-timeout 120; # 2 minutes
  }
  syncer {
         rate 110M;
         al-extents 257;
  }
  net {
         cram-hmac-alg "sha1";
         shared-secret "secret";
         after-sb-0pri disconnect;
         after-sb-1pri consensus;
         after-sb-2pri disconnect;
         rr-conflict disconnect;
  }
  disk {
        fencing resource-only;
        on-io-error detach;
        #max-bio-bvecs 1;
  }
  handlers {
         # what should be done in case the node is primary, degraded
         # (=no connection) and has inconsistent data.
         pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";

         # The node is currently primary, but lost the after split brain
         # The node is currently primary, but lost the after split brain
         pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";

         # In case you have set the on-io-error option to "call-local-io-error",
         # this script will get executed in case of a local IO error. It is
         # expected that this script will case a immediate failover in the
         # cluster.
         local-io-error "echo o > /proc/sysrq-trigger ; halt -f";

         # Commands to run in case we need to downgrade the peer's disk
         # state to "Outdated". Should be implemented by the superior
         # communication possibilities of our cluster manager.
         # The provided script uses ssh, and is for demonstration/development
         # purposis.
         # Update: Now there is a solution that relies on heartbeat's
         # communication layers. You should really use this.
         outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
  }
}
resource r0 {
         device    /dev/drbd0;
         disk      /dev/VGxen/LV1;
         meta-disk internal;
  on h1 {
         address   172.16.0.1:7788;
  }
  on h2 {
         address   172.16.0.2:7788;
  }
}
resource r1 {
         device    /dev/drbd1;
         disk      /dev/VGxen/LV2;
         meta-disk internal;
  on h1 {
         address   172.16.0.1:7789;
  }
  on h2 {
         address   172.16.0.2:7789;
  }
}
resource r2 {
         device    /dev/drbd2;
         disk      /dev/VGxen/LV3;
         meta-disk internal;
  on h1 {
         address   172.16.0.1:7790;
  }
  on h2 {
         address   172.16.0.2:7790;
  }
}


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to