Hi Craig, Can you please dump not only the last, but the last 4 WRs?
Thanks, felix > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:general- > [EMAIL PROTECTED] On Behalf Of Craig Prescott > Sent: Wednesday, January 23, 2008 8:05 AM > To: Steve Wise > Cc: [email protected] > Subject: Re: [ofa-general] SDP and iWARP > > Steve Wise wrote: > > Craig Prescott wrote: > >> Steve Wise wrote: > >>> Craig Prescott wrote: > >>>> Steve Wise wrote: > >>>>> > >>>>> Craig Prescott wrote: > >>>>>> > >>>>>> The above call also emits a couple of messages > >>>>>> into the listener's syslog now : > >>>>>> > >>>>>> Jan 9 21:53:54 tebow2 kernel: iwch_ev_dispatch - CQE Err qpid > >>>>>> 0x20 opcode 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000 > >>>>>> Jan 9 21:53:54 tebow2 kernel: post_qp_event - AE qpid 0x20 > opcode > >>>>>> 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000 > >>>>>> > >>>>> This is an async event generated due to a failure processing a SQ > >>>>> WR, I think. opcodes and status codes for iw_cxgb3 are in > cxio_wr.h. > >>>>> type 1 means it was an egress (SQ) failure > >>>>> status 0x6 is a base/bounds violation, > >>>>> but 14 seems incorrect. That's not a valid T3 opcode. ???? > >>>>> > >>>> > >>>> Ok, thanks! I guess I'm not sure what to make of that yet, > though. > >>>> > >>> > >>> See where in iwch_accept_cr() the failure is happening. It doesn't > >>> look like send_mpa_reply() is being called. > >>> > >> > >> The ECONNRESET is coming from here in iwch_accept_cr(): > >> > >> ... > >> /* wait for wr_ack */ > >> wait_event(ep->com.waitq, ep->com.rpl_done); > >> err = ep->com.rpl_err; > >> ... > >> > >> Is that what you thought was happening? > > > > I don't know exactly what is going on! But the code above means that > > the firmware never successfully sent the last streaming message (the > > mpa-start reply) and never transitioned the connection into rdma > mode. > > And the async error might indicate that some WR was posted prior to > > doing the rdma_accept() and that WR had problems. > > Ok. I'm sorry for such a slow response. > > > a few questions: > > > > What firmware are you running? ethtool -i will tell you. > > [EMAIL PROTECTED] ~]# ethtool -i eth4 > driver: cxgb3 > version: 1.0-ko > firmware-version: T 5.0.0 TP 1.1.0 > bus-info: 0000:86:00.0 > > > What ofed version exactly? > > OFED 1.3 daily from a few weeks back now: OFED-1.3-20080107-0942 > > > Does sdp post a SQ or RQ WR prior to doing the rdma_accept()? Can > you > > dump that work request? Maybe in iwch_post_send and iwch_post_recv, > > dump the work request after it is built and before the code rings the > > doorbell. You can dump it as 8B flits, and be sure an put the flits > in > > host byte order. See cxio_dump_wqe() in cxio_dbg.c... > > The following is the last work request seen before rdma_accept(): > > iwch_post_receive: Dumping built work request before ring_doorbell: > iwch_post_receive: WQE ffff810241d59f80: 17c001008000000d > iwch_post_receive: WQE ffff810241d59f88: 0000000000000000 > iwch_post_receive: WQE ffff810241d59f90: 0000000000000001 > iwch_post_receive: WQE ffff810241d59f98: 000002ff00000810 > iwch_post_receive: WQE ffff810241d59fa0: 000000044eac6000 > iwch_post_receive: WQE ffff810241d59fa8: 0000000000000000 > iwch_post_receive: WQE ffff810241d59fb0: 0000000000000000 > iwch_post_receive: WQE ffff810241d59fb8: 0000000000000000 > iwch_post_receive: WQE ffff810241d59fc0: 0000000000000000 > iwch_post_receive: WQE ffff810241d59fc8: 0000000000000000 > iwch_post_receive: WQE ffff810241d59fd0: 0000000000000000 > iwch_post_receive: WQE ffff810241d59fd8: 0000000000000000 > iwch_post_receive: WQE ffff810241d59fe0: 0000000000000000 > iwch_post_receive: returning 0 > > This comes from sdp_init_qp(), via sdp_connect_handler(). > There are a total of 64 work requests (all from > iwch_post_receive()) generated while the netserver is > trying to handle the RDMA_CM_EVENT_CONNECT_REQUEST. > > Can you help me decode the above work request? > > Thanks, > Craig > > > > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
