[ha-clusters-discuss] 4 node cluster, one node 'waiting for quorum'

Tundra Slosek Tue, 20 Oct 2009 07:46:20 PDT

> Tundra,
> 
> A cluster is only reliant on a quorum server (QS) or
> quorum disk (QD) 
> when the cluster membership changes. Thus neither a
> single QS or QD is a 
> single point of failure because they are essentially
> passive entities. 
> Having said that, they should be replaced as soon as
> a fault is detected 
> to avoid having any effect on the cluster should
> nodes join or leave it 
> (the cluster).


Thank you for the clarification Tim. Does join/leave include 'shutdown/startup' 
as well? 

On my experiment with dedicated NICs for the private interconnect, I'm starting 
to suspect something that I'm hoping you might be able to answer. The Open HA 
Cluster 2009.6 release notes at 
http://www.opensolaris.org/os/community/ha-clusters/ohac/Documentation/OHACdocs/relnotes/
 list only two x86 supported platforms, both Sunfires. The hardware that I have 
is not either of those. I'm wondering if there is a specific hardware 
requirement for the interconnect of the actual physical NIC that an Intel card 
which uses the 'e1000g' driver satisfies but devices recognized as the 'dnet' 
and 'rge' drivers do not support. 

My initial setup is one 'rge' device (resident on the motherboard) and one 
'e1000g' (server grade Intel PCI-E card) in each node, and a VLAN device on 
each for the interconnect. 

In order to test dedicated NIC for the interconnect, I added a third NIC to 
each node - scraping up what I had on hand, I had one desktop grade PCI-E Intel 
e1000g, one older PCI Intel card that is seen as 'iprb', and two old PCI SMC 
cards which are seen as 'dnet'.

What I noticed is that 'scinstall' offered to use the second e1000g for 
interconnect, but for the various other NICs I had to explicitly type in the 
NIC name, and had to explicitly state that they are Ethernet. 

I also see, in dmesg, the following warning that makes me wonder again about 
hardware capabilities:

WARNING: Received non interrupt heartbeat on mltproc1:dnet0 - mltproc0:dnet0 - 
path timeouts are likely.
-- 
This message posted from opensolaris.org

[ha-clusters-discuss] 4 node cluster, one node 'waiting for quorum'

Reply via email to