zhixinwen commented on code in PR #3077: URL: https://github.com/apache/kvrocks/pull/3077#discussion_r2379954778
########## src/cluster/batch_sender.cc: ########## @@ -100,7 +100,7 @@ Status BatchSender::sendApplyBatchCmd(int fd, const rocksdb::WriteBatch &write_b GET_OR_RET(util::SockSend(fd, redis::ArrayOfBulkStrings({"APPLYBATCH", write_batch.Data()}))); - std::string line = GET_OR_RET(util::SockReadLine(fd)); + std::string line = GET_OR_RET(util::SockReadLineWithRetry(fd, 10, 500)); Review Comment: Added some log for debugging: ``` GET_OR_RET(util::SockSend(fd, redis::ArrayOfBulkStrings({"APPLYBATCH", write_batch.Data()}))); // INSERT_YOUR_CODE // Log the SO_RCVTIMEO (receive timeout) for fd struct timeval tv; socklen_t tv_len = sizeof(tv); if (getsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, &tv_len) == 0) { LOG(INFO) << "[migrate] fd " << fd << " SO_RCVTIMEO: " << tv.tv_sec << "s " << tv.tv_usec << "us"; } else { LOG(WARNING) << "[migrate] Failed to get SO_RCVTIMEO for fd " << fd << ": " << strerror(errno); } std::string line = GET_OR_RET(util::SockReadLine(fd)); ``` and I get `[2025-09-25T18:08:58.993539+00:00][I][batch_sender.cc:107] [migrate] fd 5488 SO_RCVTIMEO: 1s 0us`. The 1s timeout is set because it was set in`checkMultipleResponses` which later affects `APPLYBATCH`. The `Resource temporarily unavailable` error is due to the timeout. I think fixing compaction is the best way to go, but we should think about how to retry failure in general. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@kvrocks.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org