zhixinwen commented on code in PR #3077:
URL: https://github.com/apache/kvrocks/pull/3077#discussion_r2379954778


##########
src/cluster/batch_sender.cc:
##########
@@ -100,7 +100,7 @@ Status BatchSender::sendApplyBatchCmd(int fd, const 
rocksdb::WriteBatch &write_b
 
   GET_OR_RET(util::SockSend(fd, redis::ArrayOfBulkStrings({"APPLYBATCH", 
write_batch.Data()})));
 
-  std::string line = GET_OR_RET(util::SockReadLine(fd));
+  std::string line = GET_OR_RET(util::SockReadLineWithRetry(fd, 10, 500));

Review Comment:
   Added some log for debugging:
   ```
     GET_OR_RET(util::SockSend(fd, redis::ArrayOfBulkStrings({"APPLYBATCH", 
write_batch.Data()})));
     // INSERT_YOUR_CODE
     // Log the SO_RCVTIMEO (receive timeout) for fd
     struct timeval tv;
     socklen_t tv_len = sizeof(tv);
     if (getsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, &tv_len) == 0) {
       LOG(INFO) << "[migrate] fd " << fd << " SO_RCVTIMEO: " << tv.tv_sec << 
"s " << tv.tv_usec << "us";
     } else {
       LOG(WARNING) << "[migrate] Failed to get SO_RCVTIMEO for fd " << fd << 
": " << strerror(errno);
     }
   
     std::string line = GET_OR_RET(util::SockReadLine(fd));
   ```
   
   and I get `[2025-09-25T18:08:58.993539+00:00][I][batch_sender.cc:107] 
[migrate] fd 5488 SO_RCVTIMEO: 1s 0us`.
   
   The 1s timeout is set because it was set in`checkMultipleResponses` which 
later affects `APPLYBATCH`. The `Resource temporarily unavailable` error is due 
to the timeout.
   
   I think fixing compaction is the best way to go, but we should think about 
how to retry failure in general.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@kvrocks.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to