randomkang commented on PR #3145:
URL: https://github.com/apache/brpc/pull/3145#issuecomment-3585718760

   > > > > @chenBright Hi, i use the 
patch(commit:db3673acf276d9baec8aff95afd07bdbc5811437) in my task, but i still 
get the error: "[wk-10] W1124 07:27:34.681490 82718 0 
external/brpc/src/brpc/rdma/rdma_endpoint.cpp:952 CutFromIOBufList] Fail to 
ibv_post_send: Cannot allocate memory, remote_rq_window_size=57, 
sq_window_size=5, sq_current=5".
   > > > > My task is model trainning and the brpc gdr is open.
   > > > 
   > > > 
   > > > Could it be that some unexpected imms are occupying SQ? Try to set 
sq_size back to its original value of sq_size * 5 / 4.
   > > > ```c++
   > > > resource->qp = AllocateQp(resource->send_cq, resource->recv_cq, 
sq_size * 5 / 4, rq_size);
   > > > ```
   > > 
   > > 
   > > I set sq_size back to its original value of sq_size * 5 / 4 and run the 
same task two times. one last 1383 minutes and the other last 752 minutes. The 
error of "Fail to ibv_post_send: Cannot allocate memory" do not happen. 
@chenBright @yanglimingcn
   > 
   > Update: one task run 1063 minutes, and report an error: "[wk-7] W1127 
16:13:16.662405 67606 158119221015812 
external/brpc/src/brpc/rdma/rdma_endpoint.cpp:997 SendImm] Fail to 
ibv_post_send: Cannot allocate memory".
   
   Update2: The other task run 1902 minutes, and also report an error "[wk-13] 
W1127 19:41:55.088111 98359 102791452311833 
external/brpc/src/brpc/rdma/rdma_endpoint.cpp:997 SendImm] Fail to 
ibv_post_send: Cannot allocate memory".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to