eric-haibin-lin commented on a change in pull request #17007: [BUGFIX] Fix race 
condition in kvstore.pushpull
URL: https://github.com/apache/incubator-mxnet/pull/17007#discussion_r356790519
 
 

 ##########
 File path: src/kvstore/kvstore_dist_server.h
 ##########
 @@ -364,21 +364,34 @@ class KVStoreDistServer {
       if (log_verbose_)  {
         LOG(INFO) << "sent response to " << update_buf->request.size() << " 
workers";
       }
+      /**
+       * Request can be for either push, pull or pushpull
+       * If pull flag is set, respond immediately with the updated values
+       * Otherwise, only send the notification
+       */
+      bool has_pull = false;
       for (const auto& req : update_buf->request) {
-        /**
-         * Request can be for either push, pull or pushpull
-         * If pull flag is set, respond immediately with the updated values
-         * Otherwise, only send the notification
-         */
-        if (req.pull) {
-          DefaultStorageResponse(type, key, req, req_data, server);
-        } else {
+        has_pull = has_pull || req.pull;
+      }
+      if (has_pull) {
+        // if there is a pull request, perform WaitToRead() once before 
DefaultStorageResponse
+        if (has_multi_precision_copy(type)) CopyFromTo(stored, store_[key]);
+        stored.WaitToRead();
+        for (const auto& req : update_buf->request) {
+          if (req.pull) {
+            DefaultStorageResponse(type, key, req, req_data, server);
+          }
+        }
+        update_buf->request.clear();
+      } else {
+        // otherwise, send response directly
+        for (const auto& req : update_buf->request) {
           server->Response(req);
         }
+        update_buf->request.clear();
+        if (has_multi_precision_copy(type)) CopyFromTo(stored, store_[key]);
 
 Review comment:
   The order is different. In this branch, it is done after `Response` with ACK 
so that we send back message as early as possible. We then push to engine to 
perform copy. 
   For the previous case, we need to response with real data instead of ACK, we 
actually need to perform copy first. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to