On Apr 14, 2010, at 3:23 AM, Sasha Khapyorsky wrote:

On 13:44 Tue 13 Apr     , Ira Weiny wrote:

This changes the logic. "num_smps_outstanding" is NOT the number on the wire, but it appears you have made it so.

Actually yes, it made it so.

This is the number which will cause process_smp_queue to continue being called.

If you are going to do this I think you need to change process_mads as well as process_one_recv. We discussed process_one_recv in the error case.

process_one_recv() failure breaks the loop anyway.

What were you trying to fix?

Ok, I think I see. We should move cl_qmap_insert to after a successful umad_send and putting total_smps here is ok. But num_smps_outstanding should be put back I think.

But then it blocks process_mads() to loop forever after single
send_smp() failure (with all empty queues and umad_recv() running
without timeout).

But moving the cl_qmap_insert below the send call fixes that. However, it does cause a memory leak because the smp is no longer in the smp_queue_head list. It needs to be put back on that list to be retried with a limit on the retries (to prevent what you are saying here.) Are you seeing a hang?

I have seen a hang when running "iblinkinfo -S <guid>". However, the problem is not with send_smp. I am seeing the mad going on the wire and returning (according to madeye) but I am not receiving it from umad_recv. I don't know why. If I run with 1 outstanding mad it works???

Ira


Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to