I think your fix looks right. But I'm getting my head warped trying to understand why you'd want numbers so low (4, 2, 1) and exactly what our algorithm will re-post for numbers that low, etc. Why do you want them so low?
On Jun 18, 2010, at 11:10 AM, nadia.derbey wrote: > Hi, > > Reference is the v1.5 branch > > If an SRQ has the following settings: S,<size>,4,2,1 > > 1) setup_qps() sets the following: > mca_btl_openib_component.qp_infos[qp].u.srq_qp.rd_num=4 > mca_btl_openib_component.qp_infos[qp].u.srq_qp.rd_init=rd_num/4=1 > > 2) create_srq() sets the following: > openib_btl->qps[qp].u.srq_qp.rd_curr_num = 1 (rd_init value) > openib_btl->qps[qp].u.srq_qp.rd_low_local = rd_curr_num - (rd_curr_num > >> 2) = rd_curr_num = 1 > > 3) if mca_btl_openib_post_srr() is called with rd_posted=1: > rd_posted > rd_low_local is false > num_post=rd_curr_num-rd_posted=0 > the loop is not executed > wr is never initialized (remains NULL) > wr->next: address not mapped > ==> SIGSEGV > > The attached patch solves the problem by ensuring that we'll actually > enter the loop and leave otherwise. > Can someone have a look please: the patch solves the problem with my > reproducer, but I'm not sure the fix covers all the situations. > > Regards, > Nadia > > <001_openib_low_rd_num.patch>_______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/