One more problem happened when trying to establish 1 connection per rail, as illustrated
in the graph.

         node0                    node1
rail0: psp0 <----------------> ep0         (port 0 on hca)
rail1: psp1 <----------------> ep1         (port 1 on hca)

rail0 got connected first and connection are always stable and correct.
However rail1 sometime connected properly sometime doesn't.
Following is the error message:

11836 Waiting for connect response
11836 Error unexpected conn event : DAT_CONNECTION_EVENT_NON_PEER_REJECTED
11836 Error connect_ep: DAT_ABORT

The program establishes the connection for both rail exactly the same.
What may caused this?

Regards,

--
Jie Cai




Davis, Arlin R wrote:
This looks like an ARP issue across your IPoIB interfaces.
Please see section 6 of the uDAPL OFED BKM.

http://www.openfabrics.org/downloads/dapl/documentation/uDAPL_ofed_testing_bkm.pdf
6. Multi IB port configuration, IPoIB arp reply issues

When two interfaces running one interface may reply to an ARP
directed to the other interface on the system. The following
configuration will cause the interfaces to ignore ARP requests if
not specifically for their IP address.

Add the following lines to /etc/sysctl.conf
net.ipv4.conf.all.arp_ignore=1
net.ipv4.conf.ib0.arp_ignore=1
net.ipv4.conf.ib1.arp_ignore=1

or use sysctl:
sysctl -w net.ipv4.conf.all.arp_ignore=1
sysctl -w net.ipv4.conf.ib0.arp_ignore=1
sysctl -w net.ipv4.conf.ib1.arp_ignore=1

-arlin

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Jie Cai
Sent: Thursday, January 29, 2009 10:53 PM
To: [email protected]
Subject: [ofa-general] Multiports single HCA uDAPL program problem

Hi All,

I am kind of noob on IB and uDAPL program. Currently, I am trying to
write a program with multirail that utilizes 2 ports on a single Mallenox
ConnectX HCA on both nodes.

OFED1.3 has been installed on a SUSE 10.3 linux system.

The current problem is that IB connection via uDAPL are very unstable,
and sometime the connection can't be established.
Error message is usually like:

20350 Server waiting for connect request on port 45248
accept: ERR dev(0x61d0e0!=0x61d0e0) or port mismatch(1!=2)
20350 Error dat_cr_accept: DAT_INTERNAL_ERROR
20350 Error connect_ep: DAT_INTERNAL_ERROR

The status of both port are active:
hca_id:    mlx4_0
   fw_ver:                2.3.000
   node_guid:            0003:ba00:0100:702c
   sys_image_guid:            0003:ba00:0100:702f
   vendor_id:            0x02c9
   vendor_part_id:            25418
   hw_ver:                0xA0
   board_id:            SUN0070000001
   phys_port_cnt:            2
       port:    1
           state:            PORT_ACTIVE (4)
           max_mtu:        2048 (4)
           active_mtu:        2048 (4)
           sm_lid:            10
           port_lid:        8
           port_lmc:        0x00

       port:    2
           state:            PORT_ACTIVE (4)
           max_mtu:        2048 (4)
           active_mtu:        2048 (4)
           sm_lid:            10
           port_lid:        9
           port_lmc:        0x00


I haven't done any specific configuration for multi-port. I assume that
OFED1.3 can do it automatically.

Would please any one help me on this?

Regards,
Jie

--
Jie Cai



_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to