randomkang commented on PR #3145: URL: https://github.com/apache/brpc/pull/3145#issuecomment-3584770575
> > > @chenBright Hi, i use the patch(commit:db3673acf276d9baec8aff95afd07bdbc5811437) in my task, but i still get the error: "[wk-10] W1124 07:27:34.681490 82718 0 external/brpc/src/brpc/rdma/rdma_endpoint.cpp:952 CutFromIOBufList] Fail to ibv_post_send: Cannot allocate memory, remote_rq_window_size=57, sq_window_size=5, sq_current=5". > > > My task is model trainning and the brpc gdr is open. > > > > > > Could it be that some unexpected imms are occupying SQ? Try to set sq_size back to its original value of sq_size * 5 / 4. > > ```c++ > > resource->qp = AllocateQp(resource->send_cq, resource->recv_cq, sq_size * 5 / 4, rq_size); > > ``` > > I set sq_size back to its original value of sq_size * 5 / 4 and run the same task two times. one last 1383 minutes and the other last 752 minutes. The error of "Fail to ibv_post_send: Cannot allocate memory" do not happen. @chenBright @yanglimingcn Update: one task run 1063 minutes, and report an error: "[wk-7] W1127 16:13:16.662405 67606 158119221015812 external/brpc/src/brpc/rdma/rdma_endpoint.cpp:997 SendImm] Fail to ibv_post_send: Cannot allocate memory". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
