Steve Wise wrote:
Craig Prescott wrote:
Steve Wise wrote:
Craig Prescott wrote:
Steve Wise wrote:

Craig Prescott wrote:

The above call also emits a couple of messages
into the listener's syslog now :

Jan 9 21:53:54 tebow2 kernel: iwch_ev_dispatch - CQE Err qpid 0x20 opcode 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000 Jan 9 21:53:54 tebow2 kernel: post_qp_event - AE qpid 0x20 opcode 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000

This is an async event generated due to a failure processing a SQ WR, I think. opcodes and status codes for iw_cxgb3 are in cxio_wr.h.
type 1 means it was an egress (SQ) failure
status 0x6 is a base/bounds violation,
but 14 seems incorrect.  That's not a valid T3 opcode. ????


Ok, thanks!  I guess I'm not sure what to make of that yet, though.


See where in iwch_accept_cr() the failure is happening. It doesn't look like send_mpa_reply() is being called.


The ECONNRESET is coming from here in iwch_accept_cr():

...
        /* wait for wr_ack */
        wait_event(ep->com.waitq, ep->com.rpl_done);
        err = ep->com.rpl_err;
...

Is that what you thought was happening?

I don't know exactly what is going on! But the code above means that the firmware never successfully sent the last streaming message (the mpa-start reply) and never transitioned the connection into rdma mode. And the async error might indicate that some WR was posted prior to doing the rdma_accept() and that WR had problems.

Ok.  I'm sorry for such a slow response.

a few questions:

What firmware are you running?  ethtool -i will tell you.

[EMAIL PROTECTED] ~]# ethtool -i eth4
driver: cxgb3
version: 1.0-ko
firmware-version: T 5.0.0 TP 1.1.0
bus-info: 0000:86:00.0

What ofed version exactly?

OFED 1.3 daily from a few weeks back now: OFED-1.3-20080107-0942

Does sdp post a SQ or RQ WR prior to doing the rdma_accept()? Can you dump that work request? Maybe in iwch_post_send and iwch_post_recv, dump the work request after it is built and before the code rings the doorbell. You can dump it as 8B flits, and be sure an put the flits in host byte order. See cxio_dump_wqe() in cxio_dbg.c...

The following is the last work request seen before rdma_accept():

iwch_post_receive: Dumping built work request before ring_doorbell:
iwch_post_receive: WQE ffff810241d59f80: 17c001008000000d
iwch_post_receive: WQE ffff810241d59f88: 0000000000000000
iwch_post_receive: WQE ffff810241d59f90: 0000000000000001
iwch_post_receive: WQE ffff810241d59f98: 000002ff00000810
iwch_post_receive: WQE ffff810241d59fa0: 000000044eac6000
iwch_post_receive: WQE ffff810241d59fa8: 0000000000000000
iwch_post_receive: WQE ffff810241d59fb0: 0000000000000000
iwch_post_receive: WQE ffff810241d59fb8: 0000000000000000
iwch_post_receive: WQE ffff810241d59fc0: 0000000000000000
iwch_post_receive: WQE ffff810241d59fc8: 0000000000000000
iwch_post_receive: WQE ffff810241d59fd0: 0000000000000000
iwch_post_receive: WQE ffff810241d59fd8: 0000000000000000
iwch_post_receive: WQE ffff810241d59fe0: 0000000000000000
iwch_post_receive: returning 0

This comes from sdp_init_qp(), via sdp_connect_handler().
There are a total of 64 work requests (all from
iwch_post_receive()) generated while the netserver is
trying to handle the RDMA_CM_EVENT_CONNECT_REQUEST.

Can you help me decode the above work request?

Thanks,
Craig



_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to