On 4/12/2014 11:29 AM, Atul Yadav wrote: > Hi, > > Yes, i am able to ping all the nodes connected with Infiniband switch > For more details please go through the attachment.
OpenSM looks fine although it is very old (3.3.5). Is this SM host based or embedded in one of your switches ? I didn't see any output related to showing pings working but I'll take your word for this. If pings work, I have no theory why this wouldn't work. -- Hal > > > > Thanks > Atul Yadav > > > On Sat, Apr 12, 2014 at 7:28 PM, Hal Rosenstock <[email protected] > <mailto:[email protected]>> wrote: > > On 4/12/2014 6:59 AM, Atul Yadav wrote: > > HI, > > > > Thanks for replying > > In this artectuire, when we are doing ibv_rc_pingpong between two > nodes > > connected with same switch we are getting result. But when we use two > > nodes with 2 switches we are getting error. > > > > Success:- > > [root@oss1 ~]# ibv_rc_pingpong > > local address: LID 0x001e, QPN 0x2c004a, PSN 0x554863, GID :: > > remote address: LID 0x0022, QPN 0x20004a, PSN 0x7c9dc2, GID :: > > 8192000 bytes in 0.01 seconds = 6992.74 Mbit/sec > > 1000 iters in 0.01 seconds = 9.37 usec/iter > > [root@oss1 ~]# > > > > [root@mds1 ~]# ibv_rc_pingpong 173.16.1.52 > > local address: LID 0x0022, QPN 0x20004a, PSN 0x7c9dc2, GID :: > > remote address: LID 0x001e, QPN 0x2c004a, PSN 0x554863, GID :: > > 8192000 bytes in 0.01 seconds = 7084.97 Mbit/sec > > 1000 iters in 0.01 seconds = 9.25 usec/iter > > [root@mds1 ~]# > > > > > > > > > > Error > > [root@nalanda mvapich2-1.9]# ibv_rc_pingpong > > local address: LID 0x0001, QPN 0x56004e, PSN 0x704d51 > > remote address: LID 0x0022, QPN 0x1c004a, PSN 0x07a0b2 > > > > [root@mds1 ~]# ibv_rc_pingpong 173.16.1.1 > > local address: LID 0x0022, QPN 0x1c004a, PSN 0x07a0b2, GID :: > > client read: Success > > Couldn't read remote address > > [root@mds1 ~]# > > Looking at libibverbs/examples/rc_pingpong.c: > > static struct pingpong_dest *pp_client_exch_dest(const char > *servername, int port, > const struct > pingpong_dest *my_dest) > { > ... > gid_to_wire_gid(&my_dest->gid, gid); > sprintf(msg, "%04x:%06x:%06x:%s", my_dest->lid, my_dest->qpn, > > my_dest->psn, gid); > if (write(sockfd, msg, sizeof msg) != sizeof msg) { > fprintf(stderr, "Couldn't send local address\n"); > goto out; > } > > > if (read(sockfd, msg, sizeof msg) != sizeof msg) { > perror("client read"); > fprintf(stderr, "Couldn't read remote address\n"); > goto out; > } > > This read is failing for some reason. This is some message exchange > over some IP network (for example, IPoIB or ethernet). > > > > > And how we test our ftree topology is working fine. > > > > Please go through the attachment. > > Looks like LIDs are assigned but can't tell about routing from info > supplied but topology looks relatively simple (5 switches, > homogenous 4x QDR links). Is the OpenSM log clean ? Any fat tree > related messages. This is likely not SM issue. > > The next issues are end node related (probably with IPoIB > configuration). Can you ping between the nodes which fail > rc_pingpong ? If not, > > -- Hal > > > > > Thank You > > Atul Yadav > > > > > > On Sat, Apr 12, 2014 at 12:14 AM, Hal Rosenstock > <[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> > wrote: > > > > On 4/11/2014 2:21 PM, Atul Yadav wrote: > > > Dear Team, > > > > > > We are trying to build Fat tree topology. > > > The details are given below: > > > Unmanaged switches 36 port quantity 5 > > > As per the some blog we need to modify the opensm.conf file > > > But we are unable to identify some parameter like:- > > > root_guid_file ??????? > > > > Fat tree routing will try to autodetect the roots but this may not > > work and it is better to specify the root GUIDs. In your case, > they > > are the GUIDs for switches A and B. > > > > The root GUID file is then provided to OpenSM either via the conf > > file or command line parameters. The command line parameter is > [-a | > > --root_guid_file <path to file>] > > > > OpenSM man page says: > > > > -a, --root_guid_file <file name> > > Set the root nodes for the Up/Down or Fat-Tree > routing > > algorithm > > to the guids provided in the given file (one to > a line). > > > > It also says: > > > > If the root guid file is not provided (?-a? or > > ?--root_guid_file? > > options), the topology has to be pure fat-tree that > > complies with the > > following rules: > > - Tree rank should be between two and eight (inclusively) > > - Switches of the same rank should have the same number > > of UP-going port groups*, unless they are root > switches, > > in which case the shouldn?t have UP-going ports at all. > > - Switches of the same rank should have the same number > > of DOWN-going port groups, unless they are leaf > switches. > > - Switches of the same rank should have the same number > > of ports in each UP-going port group. > > - Switches of the same rank should have the same number > > of ports in each DOWN-going port group. > > - All the CAs have to be at the same tree level (rank). > > > > If the root guid file is provided, the topology doesn?t > have > > to be pure > > fat-tree, and it should only comply with the following > rules: > > - Tree rank should be between two and eight (inclusively) > > - All the Compute Nodes** have to be at the same tree > level > > (rank). > > Note that non-compute node CAs are allowed here to > be at > > different > > tree ranks. > > > > * ports that are connected to the same remote switch are > > referenced as > > port group. > > > > ** list of compute nodes (CNs) can be specified by > > -u or > > --cn_guid_file OpenSM options. > > > > -- Hal > > > > > > > > Need your input for this ? > > > > > > > > > > > > > > > Thank You > > > Atul Yadav > > > > > > > > > > > > > > > _______________________________________________ > > > ewg mailing list > > > [email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>> > > > http://lists.openfabrics.org/mailman/listinfo/ewg > > > > > > _______________________________________________ ewg mailing list [email protected] http://lists.openfabrics.org/mailman/listinfo/ewg
