On 09:52 Wed 14 Apr , Ira Weiny wrote:
>
> > But then it blocks process_mads() to loop forever after single
> > send_smp() failure (with all empty queues and umad_recv() running
> > without timeout).
>
> But moving the cl_qmap_insert below the send call fixes that.
It doesn't:
int process_mads(smp_engine_t * engine)
{
int rc = 0;
while (engine->num_smps_outstanding > 0) {
if ((rc = process_smp_queue(engine)) != 0)
return rc;
while (!cl_is_qmap_empty(&engine->smps_on_wire))
if ((rc = process_one_recv(engine)) != 0)
return rc;
}
return 0;
}
After send_smp() failure engine->num_smps_outstanding still be > 0 and
will be never decreased (tested).
> However, it does cause a memory leak because the smp is no longer in
> the smp_queue_head list.
This is correct about leaking.
> It needs to be put back on that list to be
> retried with a limit on the retries (to prevent what you are saying
> here.)
We have already retries mechanism implemented in umad_send(), so likely
failed MAD should be just dropped and freed:
diff --git a/infiniband-diags/libibnetdisc/src/query_smp.c
b/infiniband-diags/libibnetdisc/src/query_smp.c
index 08e3ef7..89c0b05 100644
--- a/infiniband-diags/libibnetdisc/src/query_smp.c
+++ b/infiniband-diags/libibnetdisc/src/query_smp.c
@@ -96,8 +96,10 @@ static int process_smp_queue(smp_engine_t * engine)
if (!smp)
return 0;
- if ((rc = send_smp(smp, engine->ibmad_port)) != 0)
+ if ((rc = send_smp(smp, engine->ibmad_port)) != 0) {
+ free(smp);
return rc;
+ }
engine->num_smps_outstanding++;
cl_qmap_insert(&engine->smps_on_wire, (uint32_t) smp->rpc.trid,
(cl_map_item_t *) smp);
> Are you seeing a hang?
I'm seeing endless loop.
> I have seen a hang when running "iblinkinfo -S <guid>".
What do you mean "hang"? Endless loop?
> However, the
> problem is not with send_smp. I am seeing the mad going on the wire
> and returning (according to madeye) but I am not receiving it from
> umad_recv. I don't know why. If I run with 1 outstanding mad it
> works???
Do you see this with current master (for me 'iblinkinfo -S' works fine,
but I have only two switches).
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html