I have been able to log the debug messages now however I see no errors that would indicate where the problem is.
Just to recap quickly, the problem is that san-booting over InfiniBand using SRP doesn't work and just times out. The timeout occurs while waiting for a response to the SRP login request. I'm fairly certain the problem lies within gPXE because I can access the SRP target just fine through a local installation of Windows. In addition, on the SRP target side I have traced through the ib_srpt module and found that a login response is generated and sent (or at least posted to the mthca module work queue). On the gPXE side I've found that I'm not receiving the SRP_LOGIN_RSP packet even at the InfiniBand protocol level (net/infiniband.c). So far I have been able to determine the packet is lost at some point in the Arbel driver (drivers/infiniband/arbel.c) before arbel_complete().This would indicate the problem exists within the Arbel driver and explains why SRP sanboot worked with the Hermon driver. Despite compiling with DEBUG=arbel:3 I get no errors indicating there are any problems or dropped packets. Here is the output from autoboot with DEBUG=srp,ipoib,arp,infiniband,ib_cm,ib_cmrc,ib_mcast,ib_mi,ib_packet,ib _pathrec,ib_sma,ib_smc,ib_srp Note: I have added some debug messages to help illustrate the flow of packets. At the beginning of ipoib_complete_recv, ib_complete_recv, and ib_mi_complete_recv I have added "RX" debug messages. Booting from root path "ib_srp::::fe800000000000000002c9020022e5e5::0002c9020022e5e4::0002c9020 022e5e4:0002c9020022e5e4" SRP 0xbb134 using ib_srp::::fe800000000000000002c9020022e5e5::0002c9020022e5e4::0002c90200 22e5e4:0002c9020022e5e4 SRP attached successfully IBDEV 0xb9a84 creating completion queue IBDEV 0xb9a84 created 8-entry completion queue 0xbb4c4 (0xbb214) with CQN 0x83 IBDEV 0xb9a84 creating queue pair IBDEV 0xb9a84 created queue pair 0xbb4f4 (0xbb5c4) with QPN 0x550403 IBDEV 0xb9a84 QPN 0x550403 has 4 send entries at [0xbb5a0,0xbb5b0) IBDEV 0xb9a84 QPN 0x550403 has 2 receive entries at [0xbb5b0,0xbb5b8) CMRC 0xbb1b4 using QPN 550403 SRP 0xbb134 TX login request tag 0000000000000001 CM 0xbbb64 created for IBDEV 0xb9a84 QPN 550403 CM 0xbbb64 connecting to fe800000:00000000:0002c902:0022e5e5 0002c902:0022e5e4 MI 0xba564 TX TID 6750584500000003 (03,02,01,0035) status 0000 infiniband RX MI 0xba564 RX MI 0xba564 RX TID 6750584500000003 (03,02,81,0035) status 0000 IBDEV 0xb9a84 path to fe800000:00000000:0002c902:0022e5e5 is 0007 sl 0 rate 6 MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000 MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000 MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000 MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000 infiniband RX IPoIB 0xb9ccc RX ARP cache add: IP 10.20.76.1 => IPoIB 80000404:fe800000:00000000:0002c902:0022e5e5 ARP reply: IP 10.20.76.45 => IPoIB 00550402:fe800000:00000000:0002c902:00243035 IPoIB peer 4 has MAC 80000404:fe800000:00000000:0002c902:0022e5e5 MI 0xba564 TX TID 6750584500000005 (03,02,01,0035) status 0000 infiniband RX MI 0xba564 RX MI 0xba564 RX TID 6750584500000005 (03,02,81,0035) status 0000 MI 0xba564 RX TID 6750584500000005 handling via transaction handler IBDEV 0xb9a84 path to fe800000:00000000:0002c902:0022e5e5 is 0007 sl 0 rate 6 infiniband RX IPoIB 0xb9ccc RX ARP cache update: IP 10.20.76.1 => IPoIB 80000404:fe800000:00000000:0002c902:0022e5e5 ARP reply: IP 10.20.76.45 => IPoIB 00550402:fe800000:00000000:0002c902:00243035 MI 0xba564 TX TID 6750584500000004 (07,02,03,0010) status 0000 MI 0xba564 abandoning TID 6750584500000004 CM 0xbbb64 connection request failed: Connection timed out (0x4c206035) CMRC 0xbb1b4 disconnected: Connection timed out (0x4c206035) SRP 0xbb134 socket closed: Connection timed out (0x4c206035) From: Itay Gazit [mailto:itayga...@gmail.com] Sent: Friday, June 25, 2010 11:47 AM To: Stefan Hajnoczi; M Lowe Cc: etherboot-disc...@lists.sourceforge.net; gpxe; Michael Brown Subject: Re: [Etherboot-discuss] SRP timeout Hi Matthew, Stefan is right, you should reduce the DEBUG messages depth to find the fail cause. I have tried SRP boot only with Hermon driver (ConnectX) and it worked for me. Regards, Itay _______________________________________________ gPXE mailing list gPXE@etherboot.org http://etherboot.org/mailman/listinfo/gpxe