Hi Or,

Sorry, I had not seen this earlier thread. I think a combination of the two patches, and another change would address both our problems. The patch I sent ensures neigh_lookup and neigh_event_send lookup the same route if source based routing tables are being used.

With Sean's patch, if that ip_dev_lookup is duplicated to the addr_send_arp, that would address the source based routing case as well I think:

addr_send_arp:
+  if (src_ip)
+    oif = ib_dev_lookup(src_ip)
+  s_addr = src_ip;

addr_resolve_remote:
+  if (src_ip)
+    oif = ib_dev_lookup(src_ip)
    s_addr = src_ip; /* this was already there */

Associating the device with the source IP seems to be the correct thing to do in general, but I initially avoided it in favor of source based routing rules/tables since Linux does not do this by default. Source based routing seems to be the only way to get load balancing right for regular IP traffic when two local IPs are on the same subnet, so I thought it would be better to have the src_ip alone cause the routing lookup to associate everything with the correct device, as that would works for regular IP traffic and anyone else, like ib_addr clients.

The problems I was seeing with arp was that Linux associates arp entries with specific devices, so if source based routing is used, and the arp send does not take src_ip into account, the arp is sent from the default device, and thats the device that gets the arp entry, where as neigh_lookup was looking for an entry on the correct device, and was never finding the neighbor.

I think arp_ignore=1 is also needed.

From what I can tell, on HPUX there is no device association with arp entries. Does anyone know why Linux has this flexibility? It looks like this was added in 2.2 or something, and I can't see any useful applications for this.

Second, HPUX sends a single reply for arps, with the correct hardware information, regardless of what device it is replied from (or maybe it always replies from the correct device, doesn't matter, it works). On Linux each device replies with its own hardware address (unless arp_ignore is used). I've also seen arp requests being sent with the wrong hardware address, mainly with ping initiated traffic. when sent from devX, "who has ipZ tell ipY" causes the node with ipZ to create an implicit arp entry associating ipY with devX (rather than devY).

The initial patch I sent is less restrictive but relies on source based routing to get everything working, perhaps an explicit device mapping (as above) makes more sense for RDMA traffic. Please correct me if something I said is incorrect or if these changes conflict with other working configurations.

Thanks for your help.
Leo Tominna

Disclaimer: The statements and opinions expressed here are my own and do not necessarily reflect those of my employer.

On 7/12/2009 4:13 AM, Or Gerlitz wrote:
Leo Tominna wrote:
This patch appears to help when strict ARP handling is enabled or when non-standard routing tables are used. The ARP request is replied to through the device that will be used for subsequent communication, so the ARP entry gets associated with the correct device in the ARP cache. tcpdump shows consistent ARPs generated by similar arguments to ping and rds-ping.
Hi Loe,

Does this patch comes to solve the problem discussed over the "pick the outgoing HCA based on the IP used for bind" threads dated to February this year at the general and rds-devel mailing lists (http://lists.openfabrics.org/pipermail/general/2009-February/057008.html)? also by "strict ARP handling" do you refer to the case where there are multiple NICs on the same L2 broadcast domain (VLAN/Partition)?

Or.


_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to