Keep us posted. :)

On 21/10/14 08:40 AM, John Scalia wrote:
I've been check hostname resolution this morning, and all the systems
are listed in each /etc/hosts file (No DNS in this environment.) and
ping works on every system both to itself and all the other systems. At
least it's working on the 10.10.1.0/24 network.

I ran tcpdump trying to see what traffic is on port 5405 on each system,
and I'm only seeing outbound on each, even though netstat shows each is
listening on the multicast address. My suspicion is that the router is
eating the multicast broadcasts, so I may try the unicast address
instead, but I'm waiting on one of our network engineers to see if my
suspicion is correct about the router. He volunteered to help late
yesterday.

On 10/20/2014 4:34 PM, Digimer wrote:
It looks sane on the surface. The 'gethostip' tool comes from the
'syslinux' package, and it's really handy! The '-d' says to give the
IP in dotted-decimanl notation only.

What I was trying to see was whether the 'uname -n' resolved to the IP
on the same network card as the other nodes. This is how corosync
decides which interface to send cluster traffic onto. I suspect you
might have a general network issue, possibly related to multicast.
(Some switches and some hypervisor virtual networks don't play nice
with corosync).

Have you tried unicast? If not, try setting the <cman ../> element to
have the <cman transport="udpu" ... /> attribute. Do note that unicast
isn't as efficient as multicast, so thought it might work, I'd
personally treat it as a debug tool to isolate the source of the problem.

cheers

digimer

PS - Can you share your pacemaker configuration?

On 20/10/14 03:40 PM, John Scalia wrote:
Sure, and thanks for helping.

Here's the /etc/cluster/cluster.conf file and it is identical on all
three
systems:

<cluster config_version="11" name="pgdb_cluster">
   <fence_daemon/>
   <clusternodes>
     <clusternode name="csgha1" nodeid="1">
       <fence>
         <method name="pcmk-redirect">
           <device name="pcmk" port="csgha1"/>
         </method>
       </fence>
     </clusternode>
     <clusternode name="csgha2" nodeid="2">
       <fence>
         <method name="pcmk-redirect">
           <device name="pcmk" port="csgha2"/>
         </method>
       </fence>
     </clusternode>
     <clusternode name="csgha3" nodeid="3">
       <fence>
         <method name="pcmk-redirect">
           <device name="pcmk" port="csgha3"/>
         </method>
       </fence>
     </clusternode>
   </clusternodes>
   <cman/>
   <fencedevices>
     <fencedevice agent="fence_pcmk" name="pcmk"/>
   </fencedevices>
   <rm>
     <failoverdomains/>
     <resources/>
   </rm>
</cluster>

uname -n reports "csgha1" on that system, "csgha2" on its system, and
"csgha3" on the last system.
I don't seem to have gethostip on any of these systems, so I don't
know if
the next section helps or not.
"ifconfig -a" reports csgha1: eth0 = 172.17.1.21
                                          eth1 = 10.10.1.128
                             csgha2: eth0 = 10.10.1.129
Yeah, I know this looks a little weird, but it was the way our
automated VM
control did the interfaces
                                          eth1 = 172.,17.1.3
                             csgha3: eth0 = 172.17.1.23
                                          eth1 = 10.10.1.130
The /etc/hosts file on each system only has the 10.10.1.0/24 address for
each system in in it.
iptables is not running on these systems.

Let me know if you need more information, and I very much appreciate
your
assistance.
--
Jay

On Mon, Oct 20, 2014 at 3:18 PM, Digimer <li...@alteeve.ca> wrote:

On 20/10/14 02:50 PM, John Scalia wrote:

Hi all,

I'm trying to build my first ever HA cluster and I'm using 3 VMs
running
CentOS 6.5. I followed the instructions to the letter at:

http://clusterlabs.org/quickstart-redhat.html

and everything appears to start normally, but if I run "cman_tool
nodes
-a", I only see:

Node     Sts    Inc          Joined Name
          1      M     64         2014-10--20 14:00:00 csgha1
                  Addresses: 10.10.1.128
          2      X 0
csgha2
          3      X 0
csgha3

In the other systems, the output is the same except for which
system is
shown as joined. Each shows just itself as belonging to the cluster.
Also, "pcs status" reflects similarly with non-self systems showing
offline. I've checked "netstat -an" and see each machine listening on
ports 5405 and 5405. And the logs are rather involved, but I'm not
seeing errors in it.

Any ideas for where to look for what's causing them to not
communicate?
--
Jay


Can you share your cluster.conf file please? Also, for each node:

* uname -n
* gethostip -d $(uname -n)
* ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr |
awk '{
print $1 }'
* iptables-save | grep -i multi

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems




_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to