On Monday 23 July 2007, Rudi Ahlers wrote:
> 
> Max Hofer wrote:
> > HA is about redundancy. This means having 1 NIC will
> > probably not make you system very reliable.
> >
> > Does the NIC have 2 ports?
> >
> > Your worst case scenario (as I can think of):
> > - connection failures to one of the NICs (!not NIC failure)
> > which will turn off/on/off connectivity (this may be cabling
> > problems, switch problems etc.)
> > ---> your cluster will split, rejoin, split rejoin. And I'm sure
> > it will not take long and you have a split brain problem.
> >
> > Back to your question:
> > - we run heartbeat over the LAN, not a big problem if you use 
> > it only to share heartbeat data (i.e. no big data is transferred 
> > like DRBD devices etc.).
> >
> > Your problem is that you do not have connection redudancy 
> > between the cluster nodes.
> >
> > Possible solution: what about interconnecting the two servers
> > with a serial cable? At least you will avoid the split brain problem
> > when the network fails.
> >
> > WHat about USB connections?
> >
> > kind regards Max
> >
> > On Monday 23 July 2007, Rudi Ahlers wrote:
> >   
> >> Hi all
> >>
> >> I hope someone can assist me with this one. I previously (almost a year 
> >> ago) setup heartbeat on 2x Suse 9.3 machines, where each server had a 
> >> wireless NIC & a onboard NIC. The network ran on the wireless NIC's, and 
> >> I then used the onboard NIC's with a crossover cable to monitor heartbeat.
> >>
> >> Recently, we took out the wireless network, and put in a 8 port 10/100 
> >> switch to run eveything on. So, each server now only has 1 NIC. The 
> >> wireless gave us too many problems, and getting working cards was a 
> >> problem. To put it differently, the client is 700KM away, and not IT 
> >> literate, so we had to fedex the wireless NIC to them, and try and make 
> >> it work over the phone. Bottomline, running a normal CAT5 LAN is less 
> >> error prone.
> >>
> >> So, to get back to the question. How stable / safe / redundant is it to 
> >> run heartbeat over the same NIC's as the LAN?
> >> If server 1 use IP 192.168.0.5, server 2 use 192.168.0.6 & tell 
> >> heartbeat to server 192.168.0.3 on the LAN, will this work well enough?
> >>
> >>     
> The onboard NIC's are standard one-port NIC's, and from there the 
> servers connect directly into the 8 port 10/100 switch.
> 
> I hear what you're saying, and it makes sense.
> 
> But, here's my situation. The whole network runs on 1 switch (only 7 
> PC's), so if the switch were to fail, so does the network. If the Mobo / 
> CPU / PSU / RAM were to fail on on the (standard PIV) servers fail, the 
> server fails. Thus, in either case a second NIC won't help me much. The 
> only reason I use heartbeat, is to make sure there is at least one 
> server up, serving the IP address 192.168.0.3. I use MySQL replication 
> to replicate the MySQL DB, rsync to replicate the intranet, and 
> different DHCP ranges to make sure every PC can get a DHCP address.
OK you can not make aynthing baout the switch. 

So your HB assures increases your availability because
a) your provide a fail back for disconnecting a server from the switch
b) failure of NIC of a server
c) system hangup of 1 server (whatever reason it was)

> So, there's no critical data going across the LAN for heartbeat.
There is critical data going over the switch because "MySQL replication" 
is replicating the data from one node to the other over the switch.

> What is the USB stuff that you're talking about?
The two cluster nodes should be able to communicate even when the 
switch goes out to prevent both nodes to go primary.

I have no clue how the "MySQL replication" acts if both servers go
primary. Who replicates to whom then? What happens when the switch 
comes back and both nodes are primary (main source of the replication)?

The USB/serial connetion would prevent that both cluster think "ahhh
the other is dead, i go primary".

I would suggest following test scenarios for you:
a) normal tests for failure of a node (just unplug the power)
b) disconnect active server - wait for IP failover - check if everthing works
 and then connect again - then wait a bit and repeat the step with the
current active server
c) power off the switch - wait a few minutes (see what happens with
the servers) - then power on the switch again. You cluster must
be able to handle this (means the nodes rejoin to one cluster).

If your system can handle those scenarios you will do fine.







_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to