On Mon, Apr 13, 2009 at 4:24 PM, Hal Rosenstock <[email protected]> wrote: > On Mon, Apr 13, 2009 at 5:50 PM, Chris Worley <[email protected]> wrote: > > <snip...> > >>> Were the ports getting to LinkUp/Active before partitions were configured ? >> >> Yes, before I started trying to partition, all the nodes could >> communicate... except they'd all use just one port on the server and I >> couldn't get the throughput I needed. > > I suspect the switch SMA went south sometime after this.
I'm now power-cycling the switch for each partition change. <snip> >>>> Partition "part2" with P_Key=2 should connect this client's port 0 to >>>> the sever on port 1 of mlx4_1 >>> >>> Do you really mean port 0 ? >> >> Nope... in this case I have 0x0002c903000292b0 in part2 in my >> partitions file, which is port 1, the second port of the adapter. I'm >> hoping to use both ports of all adapters on the server. > > So you're talking about physical marking on the card rather than > actual (logical) port number. I'm not sure about board markings... both ports are attached to the switch, for all IB adapters, so all should work. I'm using the numbers provided by ibstat. >> So, on one client... the one corresponding to "part2" in the >> partitions file, I put the P_Key into the "create child": >> >> echo 0x2 > /sys/class/net/ib0/create_child >> >> ... and did likewise on the host, for ib3 (the second port on the >> second adapter): >> >> echo 0x2 > /sys/class/net/ib3/create_child > > I'm not 100% sure but I think you may need the full member PKey on at > least one of them (0x800x). I've changed the P_Keys to 0x800x, and set the "create_child" files appropriately. > >> Still, no ping (the interfaces are setup correctly). > > Are there still join failure messages on the client and/or server ? > What do they say now ? Lot's of "bad P_Key" notices: Apr 13 17:32:56 649698 [F59A9A30] 0x03 -> OpenSM 3.2.5_20081207 Apr 13 17:32:56 649737 [F59A9A30] 0x80 -> OpenSM 3.2.5_20081207 Apr 13 17:32:56 650078 [F59A9A30] 0x02 -> osm_vendor_init: 1000 pending umads specified Apr 13 17:32:56 650201 [F59A9A30] 0x80 -> Entering DISCOVERING state Apr 13 17:32:56 660286 [F59A9A30] 0x02 -> osm_vendor_bind: Binding to port 0x2c90300026053 Apr 13 17:32:56 684519 [F59A9A30] 0x02 -> osm_vendor_bind: Binding to port 0x2c90300026053 Apr 13 17:32:56 703826 [470BE940] 0x80 -> Entering MASTER state Apr 13 17:32:56 704953 [470BE940] 0x02 -> osm_ucast_mgr_process: minhop tables configured on all switches Apr 13 17:32:56 713917 [470BE940] 0x80 -> SUBNET UP Apr 13 17:32:57 112574 [452BB940] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:2 num:257 (Bad P_Key) Producer:1 (Channel Adapter) from LID:1 TID:0x0000000000000741 Apr 13 17:32:57 112642 [452BB940] 0x02 -> osm_report_notice: Reporting Generic Notice type:2 num:257 (Bad P_Key) from LID:1 GID:fe80::2:c903:2:6053 Apr 13 17:32:57 282788 [416B5940] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:2 num:259 (Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11 TID:0x000000000000018e Apr 13 17:32:57 282817 [416B5940] 0x02 -> osm_report_notice: Reporting Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from LID:11 GID:fe80::2:c902:40:46f8 Apr 13 17:32:58 280801 [42AB7940] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:2 num:259 (Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11 TID:0x000000000000018f Apr 13 17:32:58 280828 [42AB7940] 0x02 -> osm_report_notice: Reporting Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from LID:11 GID:fe80::2:c902:40:46f8 Apr 13 17:32:58 761835 [434B8940] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:2 num:257 (Bad P_Key) Producer:1 (Channel Adapter) from LID:1 TID:0x0000000000000742 Apr 13 17:32:58 761858 [434B8940] 0x02 -> osm_report_notice: Reporting Generic Notice type:2 num:257 (Bad P_Key) from LID:1 GID:fe80::2:c903:2:6053 Apr 13 17:32:59 278816 [452BB940] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:2 num:259 (Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11 TID:0x0000000000000190 Apr 13 17:32:59 278835 [452BB940] 0x02 -> osm_report_notice: Reporting Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from LID:11 GID:fe80::2:c902:40:46f8 Apr 13 17:33:00 276841 [416B5940] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:2 num:259 (Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11 TID:0x0000000000000191 Apr 13 17:33:00 276862 [416B5940] 0x02 -> osm_report_notice: Reporting Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from LID:11 GID:fe80::2:c902:40:46f8 Apr 13 17:33:03 459759 [42AB7940] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:2 num:257 (Bad P_Key) Producer:1 (Channel Adapter) from LID:1 TID:0x0000000000000743 Apr 13 17:33:03 459785 [42AB7940] 0x02 -> osm_report_notice: Reporting Generic Notice type:2 num:257 (Bad P_Key) from LID:1 GID:fe80::2:c903:2:6053 Apr 13 17:33:04 268908 [434B8940] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:2 num:259 (Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11 TID:0x0000000000000192 Apr 13 17:33:04 268927 [434B8940] 0x02 -> osm_report_notice: Reporting Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from LID:11 GID:fe80::2:c902:40:46f8 Apr 13 17:33:05 266929 [452BB940] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:2 num:259 (Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11 TID:0x0000000000000193 Apr 13 17:33:05 266950 [452BB940] 0x02 -> osm_report_notice: Reporting Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from LID:11 GID:fe80::2:c902:40:46f8 Apr 13 17:33:10 456664 [420B6940] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:2 num:257 (Bad P_Key) Producer:1 (Channel Adapter) from LID:1 TID:0x0000000000000744 Apr 13 17:33:10 456690 [420B6940] 0x02 -> osm_report_notice: Reporting Generic Notice type:2 num:257 (Bad P_Key) from LID:1 GID:fe80::2:c903:2:6053 Apr 13 17:33:11 255037 [43EB9940] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:2 num:259 (Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11 TID:0x0000000000000194 Apr 13 17:33:11 255083 [43EB9940] 0x02 -> osm_report_notice: Reporting Generic Notice type:2 num:259 (Bad P_Key (switch external port)) from LID:11 GID:fe80::2:c902:40:46f8 Apr 13 17:33:12 253054 [45CBC940] 0x01 -> __osm_trap_rcv_process_request: Received Generic Notice type:2 num:259 (Bad P_Key (switch external port)) Producer:2 (Switch) from LID:11 TID:0x0000000000000195 Chris _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
