Oh. If you're trying to specifically attach to bond0 (-i bond0) and Ganglia isn't doing it, that sounds like a Ganglia bug to me. However, not even knowing what Ganglia does, nevermind its innerworkings, I'm not very authoritative on the subject. :)

-Mark

david losada carballo wrote:
Hi Mark,

in all nodes Ganglia is started after enslaving the interfaces. Ganglia
accepts a "-i <interface>" flag for making it bind into a specific
interface. It binds to the IP 239.2.11.71, I've set a host route for that
IP that points to the bond0 interface in every node.

thanks,

On Fri, 26 Apr 2002 08:55:24 -0700
Mark Smith <[EMAIL PROTECTED]> wrote:


Are you starting Ganglia before or after you enslave the interfaces?  If

you start the multicast process before you create bond0, that can
happen.

I'm also unclear what the expected result is when creating a multicast socket without specifically stating which interface to use when multiple

interfaces are available.  To which interface does your default route
point?

Can ganglia specify on which interface it binds its multicast addresses?

 (I'm not familiar with ganglia;  I don't even know what it is.)

-Mark

david losada carballo wrote:

hi everybody,

I'm not completely sure whether this is a ganglia or a bonding issue,
so I'm sending the following to both lists.

I'm experiencing some problems trying to install the ganglia cluster
toolkit in a cluster of computers interconnected with bonded ethernet
through two 3com switches. Every machine has two NICs, each NIC is
connected to a different switch.

Ganglia makes use of multicasting for distributing the status of each
node among all the nodes. It works perfectly with 8 of the computers,
which are the computing nodes (two processors, memory, two NICs, no
HD)

Here you can see the contents of /proc/net/igmp, once I start ganglia
with"gmond -i bond0" in one of these nodes:

1       lo        :     0      V2
                                010000E0     1 0:FB5C0BCB               0
2       eth0      :     1      V2
                                010000E0     1 0:FB5C0BCB               0
3       eth1      :     1      V2
                                010000E0     1 0:FB5C0BCB               0
4       bond0     :     2      V2
                                470B02EF     1 0:FEF49B60               0
                                010000E0     1 0:FB5C0BCB               0

note that ganglia has registered to bond0, correctly

I have another node that is connected to the cluster and to the
external network, as well. This one has 4 NICs, 2 of which (eth0 and
eth1) are bonded, eth2 goes to the external network and eth3 connects
to one of the switches and is used for booting the nodes (DHCP/TFTP).
This node basically gives NFS service and acts as a gateway to the
rest of the nodes. It also should be used to report ganglia monitored
metrics (this is, nifty statistics and plots about the status of the
clusters) to the external world, through HTTP.

It's this node that I'm having problems with. After starting 'gmond -i
bond0',/proc/net/igmp looks like this:

Idx     Device    : Count Querier       Group    Users Timer    Reporter
1       lo        :     0      V2
                                010000E0     1 0:FFFAA0D3               0
2       eth0      :     2      V2
                                470B02EF     1 0:FFFF6F8E               1
                                010000E0     1 0:FFFAA0D3               0
3       eth1      :     1      V2
                                010000E0     1 0:DFDA80B2               0
4       eth2      :     1      V2
                                010000E0     1 0:DFDA80B2               0
5       bond0     :     1      V2
                                010000E0     1 0:DFDA80B2               0
6       eth3      :     1      V2
                                010000E0     1 0:DFDA80B2               0

here we can see that ganglia has registered into eth0, instead of
bond0! As a result of this, ganglia in this node can't communicate
with ganglia in other nodes...

I'm using kernel 2.4.18 with bonding patch + bonding-multicast patch
by Mark Smith and the appropiate ifenslave. Output of ifconfig looks
like this:

bond0 Link encap:Ethernet HWaddr 00:04:E2:07:9A:F6 inet addr:192.168.128.1 Bcast:192.168.128.255 Mask:255.255.255.0 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:31687 errors:0 dropped:0
         overruns:0 frame:0 TX packets:31943 errors:0 dropped:0
overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4906926 (4.6 Mb) TX bytes:5296714 (5.0 Mb)

eth0      Link encap:Ethernet  HWaddr 00:04:E2:07:9A:F6  Media:unknown
         inet addr:192.168.128.1  Bcast:192.168.128.255
         Mask:255.255.255.0 UP BROADCAST DEBUG RUNNING NOARP PROMISC
         SLAVE DYNAMIC  MTU:1500 Metric:1 RX packets:16122 errors:0
         dropped:0 overruns:0 frame:0 TX packets:16303 errors:0
dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:2470826 (2.3 Mb) TX bytes:2666706 (2.5 Mb) Interrupt:11 Base address:0xa000 eth1 Link encap:Ethernet HWaddr 00:04:E2:07:9A:F6 inet addr:192.168.128.1 Bcast:192.168.128.255 Mask:255.255.255.0 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:15565 errors:0 dropped:0
         overruns:0 frame:0 TX packets:15640 errors:0 dropped:0
overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:2436100 (2.3 Mb) TX bytes:2630008 (2.5 Mb) Interrupt:12 Base address:0xc000 eth2 Link encap:Ethernet HWaddr 00:04:E2:07:9B:46 inet addr:193.144.17.59 Bcast:193.144.17.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:41000 errors:0 dropped:0 overruns:0
         frame:0 TX packets:691 errors:0 dropped:0 overruns:0
carrier:0 collisions:0 txqueuelen:100 RX bytes:14338143 (13.6 Mb) TX bytes:80149 (78.2 Kb) Interrupt:10 Base address:0xe000 eth3 Link encap:Ethernet HWaddr 00:04:75:7E:BF:49 inet addr:192.168.127.1 Bcast:192.168.127.255 Mask:255.255.255.0 UP BROADCAST RUNNING NOARP MULTICAST MTU:1500 Metric:1 RX packets:569 errors:0 dropped:0
         overruns:0 frame:0 TX packets:0 errors:0 dropped:0
overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:60840 (59.4 Kb) TX bytes:0 (0.0 b) Interrupt:11 Base address:0xb800
any suggestions? has anyone experienced a similar problem?

kind regards,








Reply via email to