Still, you want to maximize throughput on your current switch, right? Since your switch does not support Layer 3 switching, then you should probably assign all NICs unique IP addresses, as suggested in your last paragraph. This would allow you to set one NIC half-duplex in either direction. I have never tried this but it should work. Generate 2 different hosts tables and you are in business. The DLink DGS-1016TG (if that's what you have) has a 32Gbps backplane so you will never saturate the switch.
What we have tried (on 2 switches) is to use one of the NICs as our NFS-connected NIC and one as our internode commo NIC. We don't really need a dual-gigabit pipe to our NFS mount but we do need a relatively low-latency MPI pipe. It sounds like you are in the same boat.
At 09:53 PM 5/8/2004 -0300, Peter Cordes wrote:
I have a cluster of 8 dual Opterons, and one unmanaged D-link 16port gigE switch. The Opterons have dual gigE on their mobos, and right now, channel bonding is enabled. This is a bit bogus, because both receiving NICs will get a copy of every packet, I think. (Both NICs get the same MAC address when bonded, and that's what switches keep track of.) I do get ~10 or 20% higher TCP throughput than without bonding, so it is helping a bit. Somewhat surprisingly, UDP packets don't seem to be getting duplicated. I tested with nc -u, talking to nc -l -u, and stuff I typed was only received once. Maybe I'm wrong about the switch duplicating the packets, but I certainly don't get twice the bandwidth.
Anyway, I've been thinking about what can be done with an unmanaged switch. I've considered arp table hacks like the U. Kentucky flat-network idea (google for KLAT2), but not in enough detail to come up with anything good. Maybe half the nodes could talk to one of the the master nodes NICs, and half to the other, if the master node has separate MAC addresses and doesn't use bonding? This could be useful if there is significant openMosix or NFS traffic, and not just all<->all MPI traffic.
I've also thought of having two subnets, 10.0.0.0 and 10.1.0.0, with each node having one NIC in each net. 10.0.0.0 could be used for all normal traffic (NFS, ssh, etc.), while 10.1.0.0 could be exclusively for MPI. It might be more convenient to hack config files to get NFS or openMosix using different IP addresses from everything else, though.
Anyway, has anyone done anything like this, or want to expand on this idea? I haven't thought of a useful search string to google on for this kind of thing yet, so if anyone knows any good web pages about this, I'd love to see links.
-- #define X(x,y) x##y Peter Cordes ; e-mail: X([EMAIL PROTECTED] , des.ca)
"The gods confound the man who first found out how to distinguish the hours! Confound him, too, who in this place set up a sundial, to cut and hack my day so wretchedly into small pieces!" -- Plautus, 200 BC
-- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

