On Tuesday 31 January 2006 07:58, Todd Denniston wrote:
> Both machines panic-ed?
> or did one panic and the other halt with a message of "!DRBD! pri on
> incon-degr"?
That's possible... I don't remember the exact error message but perhaps you're
right that it was not a panic but it did halt.
> if in your drbd.conf disk{} section you have
> on-io-error panic;
> I would expect the node which lost it's disk to panic and the other to keep
> on going. [I count on this with a couple of mildly reliable Promise RM8000
> arrays, and have seen it do it's job.]
Interesting.... We have on-io-error detach. Should we change that to "panic"?
>
> As Alan said, set the ko-count and on-disconnect values in the net{}
> section of drbd.conf, and include your conf in the next email so we know
> what you are set to.
Here's our drbd.conf & ha.cf attached (some things have been tweaked since the
test I described, but nothing major).
> BTW which protocol are you using?
Protocol c.
Thank you,
Nate
> _______________________________________________
> Linux-HA mailing list
> [EMAIL PROTECTED]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
# $Id: ha.cf.in 7546 2006-01-30 22:23:50Z nreed $
#
# There are lots of options in this file. All you have to have is a set
# of nodes listed {"node ...} one of {serial, bcast, mcast, or ucast},
# and a value for "auto_failback".
#
# ATTENTION: As the configuration file is read line by line,
# THE ORDER OF DIRECTIVE MATTERS!
#
# In particular, make sure that the udpport, serial baud rate
# etc. are set before the heartbeat media are defined!
# debug and log file directives go into effect when they
# are encountered.
#
# All will be fine if you keep them ordered as in this example.
#
#
# Note on logging:
# If any of debugfile, logfile and logfacility are defined then they
# will be used. If debugfile and/or logfile are not defined and
# logfacility is defined then the respective logging and debug
# messages will be loged to syslog. If logfacility is not defined
# then debugfile and logfile will be used to log messges. If
# logfacility is not defined and debugfile and/or logfile are not
# defined then defaults will be used for debugfile and logfile as
# required and messages will be sent there.
#
# File to write debug messages to
#debugfile /var/log/ha-debug
#
#
# File to write other messages to
#
#logfile /var/log/ha-log
#
#
# Facility to use for syslog()/logger
#
#logfacility local7
#
#
# A note on specifying "how long" times below...
#
# The default time unit is seconds
# 10 means ten seconds
#
# You can also specify them in milliseconds
# 1500ms means 1.5 seconds
#
#
# keepalive: how long between heartbeats?
#
keepalive 2
#
# deadtime: how long-to-declare-host-dead?
#
# If you set this too low you will get the problematic
# split-brain (or cluster partition) problem.
# See the FAQ for how to use warntime to tune deadtime.
#
deadtime 30
#
# warntime: how long before issuing "late heartbeat" warning?
# See the FAQ for how to use warntime to tune deadtime.
#
warntime 5
#
#
# Very first dead time (initdead)
#
# On some machines/OSes, etc. the network takes a while to come up
# and start working right after you've been rebooted. As a result
# we have a separate dead time for when things first come up.
# It should be at least twice the normal dead time.
#
initdead 120
#
#
# What UDP port to use for bcast/ucast communication?
#
udpport 694
#
# Baud rate for serial ports...
#
#baud 19200
#
# serial serialportname ...
#serial /dev/ttyS0 # Linux
#serial /dev/cuaa0 # FreeBSD
#serial /dev/cua/a # Solaris
#
#
# What interfaces to broadcast heartbeats over?
#
#bcast eth0 # Linux
bcast eth2 # Linux
#bcast le0 # Solaris
#bcast le1 le2 # Solaris
#
# Set up a multicast heartbeat medium
# mcast [dev] [mcast group] [port] [ttl] [loop]
#
# [dev] device to send/rcv heartbeats on
# [mcast group] multicast group to join (class D multicast address
# 224.0.0.0 - 239.255.255.255)
# [port] udp port to sendto/rcvfrom (set this value to the
# same value as "udpport" above)
# [ttl] the ttl value for outbound heartbeats. this effects
# how far the multicast packet will propagate. (0-255)
# Must be greater than zero.
# [loop] toggles loopback for outbound multicast heartbeats.
# if enabled, an outbound packet will be looped back and
# received by the interface it was sent on. (0 or 1)
# Set this value to zero.
#
#
#mcast eth0 225.0.0.1 694 1 0
#
# Set up a unicast / udp heartbeat medium
# ucast [dev] [peer-ip-addr]
#
# [dev] device to send/rcv heartbeats on
# [peer-ip-addr] IP address of peer to send packets to
#
#ucast eth0 192.168.1.2
#
#
# About boolean values...
#
# Any of the following case-insensitive values will work for true:
# true, on, yes, y, 1
# Any of the following case-insensitive values will work for false:
# false, off, no, n, 0
#
#
#
# auto_failback: determines whether a resource will
# automatically fail back to its "primary" node, or remain
# on whatever node is serving it until that node fails, or
# an administrator intervenes.
#
# The possible values for auto_failback are:
# on - enable automatic failbacks
# off - disable automatic failbacks
# legacy - enable automatic failbacks in systems
# where all nodes do not yet support
# the auto_failback option.
#
# auto_failback "on" and "off" are backwards compatible with the old
# "nice_failback on" setting.
#
# See the FAQ for information on how to convert
# from "legacy" to "on" without a flash cut.
# (i.e., using a "rolling upgrade" process)
#
# The default value for auto_failback is "legacy", which
# will issue a warning at startup. So, make sure you put
# an auto_failback directive in your ha.cf file.
# (note: auto_failback can be any boolean or "legacy")
#
auto_failback off
#
#
# Basic STONITH support
# Using this directive assumes that there is one stonith
# device in the cluster. Parameters to this device are
# read from a configuration file. The format of this line is:
#
# stonith <stonith_type> <configfile>
#
# NOTE: it is up to you to maintain this file on each node in the
# cluster!
#
#stonith baytech /etc/ha.d/conf/stonith.baytech
#
# STONITH support
# You can configure multiple stonith devices using this directive.
# The format of the line is:
# stonith_host <hostfrom> <stonith_type> <params...>
# <hostfrom> is the machine the stonith device is attached
# to or * to mean it is accessible from any host.
# <stonith_type> is the type of stonith device (a list of
# supported drives is in /usr/lib/stonith.)
# <params...> are driver specific parameters. To see the
# format for a particular device, run:
# stonith -l -t <stonith_type>
#
#
# Note that if you put your stonith device access information in
# here, and you make this file publically readable, you're asking
# for a denial of service attack ;-)
#
# To get a list of supported stonith devices, run
# stonith -L
# For detailed information on which stonith devices are supported
# and their detailed configuration options, run this command:
# stonith -h
#
#stonith_host * baytech 10.0.0.3 mylogin mysecretpassword
#stonith_host ken3 rps10 /dev/ttyS1 kathy 0
#stonith_host kathy rps10 /dev/ttyS1 ken3 0
stonith_host * wti_nps @POWER_SWITCH_IPADDR@ @POWER_SWITCH_PASSWORD@
#
# Watchdog is the watchdog timer. If our own heart doesn't beat for
# a minute, then our machine will reboot.
# NOTE: If you are using the software watchdog, you very likely
# wish to load the module with the parameter "nowayout=0" or
# compile it without CONFIG_WATCHDOG_NOWAYOUT set. Otherwise even
# an orderly shutdown of heartbeat will trigger a reboot, which is
# very likely NOT what you want.
#
#watchdog /dev/watchdog
#
# Tell what machines are in the cluster
# node nodename ... -- must match uname -n
node node1.cluster1.test.awarix.com
node node2.cluster1.test.awarix.com
#
# Less common options...
#
# Treats 10.10.10.254 as a psuedo-cluster-member
# Used together with ipfail below...
#
ping @PING_ROUTER@
#
# Treats 10.10.10.254 and 10.10.10.253 as a psuedo-cluster-member
# called group1. If either 10.10.10.254 or 10.10.10.253 are up
# then group1 is up
# Used together with ipfail below...
#
#ping_group group1 10.10.10.254 10.10.10.253
#
# HBA ping derective for Fiber Channel
# Treats fc-card-name as psudo-cluster-member
# used with ipfail below ...
#
# You can obtain HBAAPI from http://hbaapi.sourceforge.net. You need
# to get the library specific to your HBA directly from the vender
# To install HBAAPI stuff, all You need to do is to compile the common
# part you obtained from the sourceforge. This will produce libHBAAPI.so
# which you need to copy to /usr/lib. You need also copy hbaapi.h to
# /usr/include.
#
# The fc-card-name is the name obtained from the hbaapitest program
# that is part of the hbaapi package. Running hbaapitest will produce
# a verbose output. One of the first line is similar to:
# Apapter number 0 is named: qlogic-qla2200-0
# Here fc-card-name is qlogic-qla2200-0.
#
#hbaping fc-card-name
#
#
# Processes started and stopped with heartbeat. Restarted unless
# they exit with rc=100
#
#respawn userid /path/name/to/run
# Only use this with R1.x style clusters:
respawn hacluster /usr/lib64/heartbeat/ipfail
#
# Access control for client api
# default is no access
#
#apiauth client-name gid=gidlist uid=uidlist
#apiauth ipfail gid=haclient uid=hacluster
###########################
#
# Unusual options.
#
###########################
#
# hopfudge maximum hop count minus number of nodes in config
#hopfudge 1
#
# deadping - dead time for ping nodes
deadping 30
#
# hbgenmethod - Heartbeat generation number creation method
# Normally these are stored on disk and incremented as needed.
#hbgenmethod time
#
# realtime - enable/disable realtime execution (high priority, etc.)
# defaults to on
#realtime off
#
# debug - set debug level
# defaults to zero
#debug 1
#
# API Authentication - replaces the fifo-permissions-based system of the
past
#
#
# You can put a uid list and/or a gid list.
# If you put both, then a process is authorized if it qualifies under
either
# the uid list, or under the gid list.
#
# The groupname "default" has special meaning. If it is specified, then
# this will be used for authorizing groupless clients, and any client
groups
# not otherwise specified.
#
# There is a subtle exception to this. "default" will never be used in
the
# following cases (actual default auth directives noted in brackets)
# ipfail (uid=HA_CCMUSER)
# ccm (uid=HA_CCMUSER)
# ping (gid=HA_APIGROUP)
# cl_status (gid=HA_APIGROUP)
#
# This is done to avoid creating a gaping security hole and matches the
most
# likely desired configuration.
#
#apiauth ipfail uid=hacluster
#apiauth ccm uid=hacluster
#apiauth cms uid=hacluster
#apiauth ping gid=haclient uid=alanr,root
#apiauth default gid=haclient
# message format in the wire, it can be classic or netstring,
# default: classic
#msgfmt classic/netstring
# do we use logging daemon?
# detail policy:
# 1. if there is any entry for debugfile/logfile/logfacility in ha.cf
# a) if use_logd is not set, logging daemon will not be used
# b) if use_logd is set to on, logging daemon will be used
# c) if use_logd is set to off, logging daemon will not be used
#
#
# 2. if there is no entry for debugfile/logfile/logfacility in ha.cf
# a) if use_logd is not set, logging daemon will be used
# b) if use_logd is set to on, logging daemon will be used
# c) if use_logd is set to off, config error, i.e. you can not turn
# off all logging options
#
# If logging daemon is used, logfile/debugfile/logfacility in this file
# are not meaningful any longer. You should check the config file for
logging
# daemon (the default is /etc/logd.cf)
#
# If you are not sure about this option, don't configure it
#
# use_logd yes/no
use_logd yes
#
# the interval we reconnect to logging daemon if the previous connection
failed
# default: 60 seconds
#conn_logd_time 60
#
#
# Configure compression module
# It could be zlib or bz2, depending on whether u have the corresponding
# library in the system.
compression bz2
#
# Confiugre compression threshold
# This value determines the threshold to compress a message,
# e.g. if the threshold is 1, then any message with size greater than 1 KB
# will be compressed, the default is 2 (KB)
compression_threshold 2
# Enable Heartbeat to do core dumps
coredumps true
# Enable Cluster Resource Manager
crm no
# Increase the real-time priority of Heartbeat
# This should eliminate late heartbeat messages in the logs
rtprio 99
# $Id: drbd.conf.in 7138 2005-12-30 17:55:40Z nreed $
#
# drbd.conf example
#
# parameters you _need_ to change are the hostname, device, disk,
# meta-disk, address and port in the "on <hostname> {}" sections.
#
# you ought to know about the protocol, and the various timeouts.
#
# you probably want to set the rate in the syncer sections
#
# increase timeout and maybe ping-int in net{}, if you see
# problems with "connection lost/connection established"
# (or change your setup to reduce network latency; make sure full
# duplex behaves as such; check average roundtrip times while
# network is saturated; and so on ...)
#
#
# Upgrading from DRBD-0.6.x
#
# Using the size parameter in the disk section (was disk-size) is
# no longer valid. The agreed disk size is now stored
# in DRBD's non volatile meta data files.
#
# NOTE that if you do not have some dedicated partition to use for
# the meta-data, you may use 'internal' meta-data.
#
# THIS HOWEVER WILL DESTROY THE LAST 128M
# OF THE LOWER LEVEL DEVICE.
#
# So you better make sure you shrink the filesystem by 128M FIRST!
# or by 132M just to be sure... :)
#
skip {
As you can see, you can also comment chunks of text
with a 'skip[optional nonsense]{ skipped text }' section.
This comes in handy, if you just want to comment out
some 'resource <some name> {...}' section:
just precede it with 'skip'.
The basic format of option assignment is
<option name><linear whitespace><value>;
It should be obvious from the examples below,
but if you really care to know the details:
<option name> :=
valid options in the respective scope
<value> := <num>|<string>|<choice>|...
depending on the set of allowed values
for the respective option.
<num> := [0-9]+, sometimes with an optional suffix of K,M,G
<string> := (<name>|\"([^\"\\\n]*|\\.)*\")+
<name> := [/_.A-Za-z0-9-]+
}
#
# At most ONE global section is allowed.
# It must precede any resource section.
#
# global {
# use this if you want to define more resources later
# without reloading the module.
# by default we load the module with exactly as many devices
# as configured mentioned in this file.
#
# minor_count 5;
# this is for people who set up a drbd device via the
# loopback network interface or between two VMs on the same
# box, for testing/simulating/presentation
# otherwise it could trigger a run_tasq_queue deadlock.
# I'm not sure whether this deadlock can happen with two
# nodes, but it seems at least extremely unlikely; and since
# the io_hints boost performance, keep them enabled.
#
# With linux 2.6 it no longer makes sense.
# So this option should vanish. --lge
#
# disable_io_hints
# }
#
# this need not be r#, you may use phony resource names,
# like "resource web" or "resource mail", too
#
resource r0 {
# transfer protocol to use.
# C: write IO is reported as completed, if we know it has
# reached _both_ local and remote DISK.
# * for critical transactional data.
# B: write IO is reported as completed, if it has reached
# local DISK and remote buffer cache.
# * for most cases.
# A: write IO is reported as completed, if it has reached
# local DISK and local tcp send buffer. (see also sndbuf-size)
# * for high latency networks
#
#**********
# uhm, benchmarks have shown that C is actually better than B.
# this note shall disappear, when we are convinced that B is
# the right choice "for most cases".
# Until then, always use C unless you have a reason not to.
# --lge
#**********
#
protocol C;
# what should be done in case the cluster starts up in
# degraded mode, but knows it has inconsistent data.
incon-degr-cmd "halt -f";
startup {
# Wait for connection timeout.
# The init script blocks the boot process until the resources
# are connected.
# In case you want to limit the wait time, do it here.
#
# wfc-timeout 0;
# Wait for connection timeout if this node was a degraded cluster.
# In case a degraded cluster (= cluster with only one node left)
# is rebooted, this timeout value is used.
#
degr-wfc-timeout 120; # 2 minutes.
}
disk {
# if the lower level device reports io-error you have the choice of
# "pass_on" -> Report the io-error to the upper layers.
# Primary -> report it to the mounted file system.
# Secondary -> ignore it.
# "panic" -> The node leaves the cluster by doing a kernel panic.
# "detach" -> The node drops its backing storage device, and
# continues in disk less mode.
#
on-io-error detach;
}
net {
# this is the size of the tcp socket send buffer
# increase it _carefully_ if you want to use protocol A over a
# high latency network with reasonable write throughput.
# defaults to 2*65535; you might try even 1M, but if your kernel or
# network driver chokes on that, you have been warned.
sndbuf-size 512k;
timeout 60; # 6 seconds (unit = 0.1 seconds)
connect-int 10; # 10 seconds (unit = 1 second)
ping-int 10; # 10 seconds (unit = 1 second)
# Maximal number of requests (4K) to be allocated by DRBD.
# The minimum is hardcoded to 32 (=128 kb).
# For hight performance installations it might help if you
# increase that number. These buffers are used to hold
# datablocks while they are written to disk.
#
# max-buffers 2048;
# The highest number of data blocks between two write barriers.
# If you set this < 10 you might decrease your performance.
# max-epoch-size 2048;
# if some block send times out this many times, the peer is
# considered dead, even if it still answers ping requests.
# ko-count 4;
# if the connection to the peer is lost you have the choice of
# "reconnect" -> Try to reconnect (AKA WFConnection state)
# "stand_alone" -> Do not reconnect (AKA StandAlone state)
# "freeze_io" -> Try to reconnect but freeze all IO until
# the connection is established again.
on-disconnect reconnect;
}
syncer {
# Limit the bandwith used by the resynchronisation process.
# default unit is KB/sec; optional suffixes K,M,G are allowed
#
rate 10M;
# All devices in one group are resynchronized parallel.
# Resychronisation of groups is serialized in ascending order.
# Put DRBD resources which are on different physical disks in one group.
# Put DRBD resources on one physical disk in different groups.
#
group 1;
# Configures the size of the active set. Each extent is 4M,
# 257 Extents ~> 1GB active set size. In case your syncer
# runs @ 10MB/sec, all resync after a primary's crash will last
# 1GB / ( 10MB/sec ) ~ 102 seconds ~ One Minute and 42 Seconds.
# BTW, the hash algorithm works best if the number of al-extents
# is prime. (To test the worst case performace use a power of 2)
al-extents 257;
}
on node1.cluster1.test.awarix.com {
device /dev/drbd0;
disk @SHARED_DEVICE@;
address 169.254.0.1:7788;
meta-disk @[EMAIL PROTECTED];
# meta-disk is either 'internal' or '/dev/ice/name [idx]'
#
# You can use a single block device to store meta-data
# of multiple DRBD's.
# E.g. use meta-disk /dev/hde6[0]; and meta-disk /dev/hde6[1];
# for two different resources. In this case the meta-disk
# would need to be at least 256 MB in size.
#
# 'internal' means, that the last 128 MB of the lower device
# are used to store the meta-data.
# You must not give an index with 'internal'.
}
on node2.cluster1.test.awarix.com {
device /dev/drbd0;
disk @SHARED_DEVICE@;
address 169.254.0.2:7788;
meta-disk @[EMAIL PROTECTED];
}
}
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/