Tang, Changqing wrote:
I am testing OFED 1.3 udapl v1, I have three nodes, n1, n2, and n3,
if I run two ranks between n1 and n2, it works, n2 and n3, it works again,
but if I run between n1 and n3, it fails with:

dat_cr_accept() failed: DAT_INTERNAL_ERROR

What could be the reason ? I did not change anything else except the
node to run. Thanks for help.


What IPoIB interfaces are configured on the nodes? Can you ping via
IPoIB from n1 to n3? Are you using the same IB port on each node?

This error could be caused by a physical port mismatch between
the connect request and the listen bindings due to the ARP reply.

If you have multiple interfaces then one may reply to an ARP
directed to the other interfaces on the system. The following
configuration will cause the interfaces to ignore ARP requests
not directed to their specific IP address.

Add the following lines to /etc/sysctl.conf

net.ipv4.conf.all.arp_ignore=1
net.ipv4.conf.ib0.arp_ignore=1
net.ipv4.conf.ib1.arp_ignore=1

or use sysctl:

sysctl -w net.ipv4.conf.all.arp_ignore=1
sysctl -w net.ipv4.conf.ib0.arp_ignore=1
sysctl -w net.ipv4.conf.ib1.arp_ignore=1

-arlin
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to