Not a big deal. Vlad, can you pull librdmacm 1.0.14.1 into the next OFED 1.5.3 RC? The only change versus 1.0.14 is reverting a patch to the rping sample.
Thanks, Sean > -----Original Message----- > From: Steve Wise [mailto:[email protected]] > Sent: Tuesday, February 15, 2011 5:57 PM > To: Hefty, Sean > Cc: OpenFabrics EWG; Tziporet Koren > Subject: Re: rping/cxgb3 regression > > I pulled it down, built/installed it on 2 nodes, then ran a bunch of > rpings. No hangs. Looks good! > > Thanks Sean. Sorry about this. > > Steve. > > On 2/15/2011 7:46 PM, Hefty, Sean wrote: > > I placed a 1.0.14.1 package on the ofa server in the downloads/rdmacm > section. Can you verify that it works? If so, I'll ask to pull it into > 1.5.3 > > > >> -----Original Message----- > >> From: Steve Wise [mailto:[email protected]] > >> Sent: Tuesday, February 15, 2011 10:37 AM > >> To: Hefty, Sean > >> Cc: OpenFabrics EWG; Tziporet Koren > >> Subject: Re: rping/cxgb3 regression > >> > >> > >> On 02/15/2011 12:18 PM, Hefty, Sean wrote: > >>>> I'm wondering if pulling the rping changes for ofed-1.5.3 would be ok? > >> I > >>>> guess to do this you would have to push a > >>>> 1-off librdmacm without those changes? Or maybe back up what is in > >> OFED- > >>>> 1.5.3 to the previous release without this > >>>> rping change? > >>>> > >>>> Thoughts? > >>> Is the commit (93635fa33b41d356fa096242fec4ce788194b42f) below the > issue? > >> (Btw, the author listed in my git tree is wrong.) > >> Yes. > >> > >>> I don't think I want to drop back to 1.0.13 for 1.5.3, so maybe > reverting > >> this change and pushing out 1.0.14.1 would work. There's just one other > >> change after 1.0.14 at the moment, and it's to the build, so I'd skip a > >> full release for now. > >>> Let me know if you think this would work. > >>> > >> I just tested that removing this from 1.0.14 will resolve the issue for > >> 1.5.3. > >> > >> > >>> - Sean > >>> > >>> --- > >>> > >>> librdmacm/rping: Make sure CQ event thread exits before > destroying > >> the CQ > >>> It is possible for the CQ event thread to poll the CQ after it > has > >> been > >>> destroyed which can result in a seg fault on T3 interfaces. This > >> patch > >>> waits for the thread to exit before destroying the CQ. > >>> > >>> Signed-off-by: Steve Wise<[email protected]> > >>> Signed-off-by: Sean Hefty<[email protected]> > >>> > >>> diff --git a/examples/rping.c b/examples/rping.c > >>> index 2d4c2de..ee292ec 100644 > >>> --- a/examples/rping.c > >>> +++ b/examples/rping.c > >>> @@ -280,12 +280,11 @@ static int rping_cq_event_handler(struct rping_cb > >> *cb) > >>> ret = 0; > >>> > >>> if (wc.status) { > >>> - if (wc.status != IBV_WC_WR_FLUSH_ERR) { > >>> + if (wc.status != IBV_WC_WR_FLUSH_ERR) > >>> fprintf(stderr, > >>> "cq completion failed status > >> %d\n", > >>> wc.status); > >>> - ret = -1; > >>> - } > >>> + ret = -1; > >>> goto error; > >>> } > >>> > >>> @@ -802,10 +801,9 @@ static void *rping_persistent_server_thread(void > >> *arg) > >>> rping_test_server(cb); > >>> rdma_disconnect(cb->child_cm_id); > >>> + pthread_join(cb->cqthread, NULL); > >>> rping_free_buffers(cb); > >>> rping_free_qp(cb); > >>> - pthread_cancel(cb->cqthread); > >>> - pthread_join(cb->cqthread, NULL); > >>> rdma_destroy_id(cb->child_cm_id); > >>> free_cb(cb); > >>> return NULL; > >>> @@ -890,6 +888,7 @@ static int rping_run_server(struct rping_cb *cb) > >>> > >>> rping_test_server(cb); > >>> rdma_disconnect(cb->child_cm_id); > >>> + pthread_join(cb->cqthread, NULL); > >>> rdma_destroy_id(cb->child_cm_id); > >>> err2: > >>> rping_free_buffers(cb); > >>> @@ -1057,6 +1056,7 @@ static int rping_run_client(struct rping_cb *cb) > >>> > >>> rping_test_client(cb); > >>> rdma_disconnect(cb->cm_id); > >>> + pthread_join(cb->cqthread, NULL); > >>> err2: > >>> rping_free_buffers(cb); > >>> err1: _______________________________________________ ewg mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
