>>> Lars Marowsky-Bree <[email protected]> schrieb am 29.08.2012 um 11:30 in
>>> Nachricht
<[email protected]>:
> On 2012-08-29T10:15:50, Ulrich Windl <[email protected]>
> wrote:
>
> > The network guys say no. Should "arp" show the Cluster-IP? I cannot see it,
> so I wonder if something's wrong.
>
> Well, you should see the MAC/IP mapping in the arp table if the host is
> on the same ethernet segment, yes. Otherwise the host doesn't know where
> to send the packets to.
I checked the arp table of the host that is hosting the cluster IP address.
Thought the host should accept ist own broadcasts also. However the machine is
also a Xen hypervisor (Dom0), so everything is connected via software bridges.
>
> You should see the ARP responses come in with tcpdump/wireshark.
>
> > Could the "martian source" thing be responsible? I see this for the ARPs:
> > Aug 29 09:21:35 o1 kernel: [ 1261.556861] martian source 172.20.3.59 from
> > 172.20.3.59, on dev br0
>
> That's difficult to comment on without knowing if "o1" is the gateway
> router, one of the servers, or one of the clients on the network, and
> what the network interfaces are like.
"o1" is a cluster node hosting the cluster IP.
[...]
> > > Can you get the network trace of the arp traffic on the router into the
> > > subnet when an outside ping comes in?
> > I see this on the host (one cluster node):
> > o1:~ # tcpdump -p -i br0 -s100 -v -n host 172.20.3.59
The router is part of some HP switch where I have no access.
>
> Are you trying to reach the cluster IP from one of the cluster nodes
> itself? I'm not sure that will work.
Why not (curiosity)? No, I was using a host that is some distance away.
>
> > tcpdump: listening on br0, link-type EN10MB (Ethernet), capture size 100
> bytes
> > 09:43:38.305460 arp who-has 172.20.3.59 tell 172.20.3.62
> > 09:43:38.305493 arp reply 172.20.3.59 is-at f1:e9:91:b1:b9:51
> >
> > (172.20.3.62 is the gateway)
>
> That looks OK. You should check the ARP table on the gateway if it is
> correctly updated with the address, though.
I'll have to meet my local guru ;-) ... Actually the MAC address was found on
the gateway as "(dynamic)", what ever that means...
>
> If you try to ping the cluster IP from a client, what does tcpdump show
> on the servers/gateway? Do you see the ICMP ECHO REQUEST go to the
> cluster IP with the above MAC? How do the servers respond?
A remote server only shows outgoing ICMP ECHO requests, but no replies, and TCP
open attempts to 172.20.3.59:445/139. I'm afraid packets end at the gateway (as
you suspected).
>
> > Packets also arrive via broadcast:
> > 09:45:03.826371 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 271) 172.20.3.59.138 > 172.20.3.63.138: NBT UDP PACKET(138)
> > 09:45:13.836608 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 271) 172.20.3.59.138 > 172.20.3.63.138: NBT UDP PACKET(138)
>
> You have traffic *from* the cluster IP to the broadcast address of your
> network? That looks wrong. All nodes are likely to log a martian source
> for that one (since they're getting traffic from a locally bound IP). To
> communicate internally in the cluster, Samba should use one of the local
> IP addresses.
I thought Port 138 is NetBIOS which is renowned for broadcasting all the time.
>
> The cluster IP is only useful for communicating with the outside world,
> not inside the cluster itself.
Well, the amazing thing is that it doesn't work here, but is supported through
Novell. In contrast, the "public_address" of CTDB works just fine here, but
isn't supported by Novell: "Due to technical limitations, this also includes
the CTDB internal fail-over functionality for IP address take-over. Please note
that this part is not supported by Novell. Only Pacemaker clusters are fully
supported."
>
> > Still don't know where to start debugging.
>
> Start with something simpler than Samba, see if the CIP can be pinged
> from the outside and what happens there.
Well shouldn't the manual (sle-ha-manuals_en/manual/book.sleha.html) include
some notes on understanding and/or troubleshooting the clustered IP addresses).
Anyway, if one clustered IP address is up, it can also be used for testing with
PING.
I also inspected the Firewall (but that's a bit complicated for me):
Chain INPUT (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 CLUSTERIP all -- br0 * 0.0.0.0/0 172.20.3.59
CLUSTERIP hashmode=sourceip-sourceport clustermac=F1:E9:91:B1:B9:51
total_nodes=5 local_node=2 hash_init=0
[...]
307K 47M input_int all -- br0 * 0.0.0.0/0 0.0.0.0/0
[...]
0 0 input_int all -- eth0 * 0.0.0.0/0 0.0.0.0/0
[...]
Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
30836 1584K ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0
PHYSDEV match --physdev-is-bridged
[...]
Chain input_int (8 references)
pkts bytes target prot opt in out source destination
618K 92M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0
[...]
Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
148 10168 ACCEPT all * * ::/0 ::/0
PHYSDEV match --physdev-is-bridged
[...]
Chain input_int (8 references)
pkts bytes target prot opt in out source destination
488 35136 ACCEPT all * * ::/0 ::/0
[...]
Regards,
Ulrich
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems