ethervoid opened a new issue, #3356: URL: https://github.com/apache/kvrocks/issues/3356
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/kvrocks/issues) and found no similar issues. ### Version - Master: Kvrocks 2.10.1 - Replica: Kvrocks 2.14.0 - OS: Linux 6.12.53-69.119.amzn2023.aarch64 (Amazon Linux 2023) ### Minimal reproduce step 1. Set up a master-replica configuration with: - High write rate (~7K ops/sec) - Small WAL retention: `rocksdb.max_total_wal_size 1024` (1GB) 2. Introduce network congestion or slowness on the replica side that causes it to consume replication data slower than the master produces it 3. The TCP send buffer on the master fills up, and the replication feed thread blocks on write() indefinitely 4. Wait for WAL rotation to prune old WAL files while the feed thread is still blocked 5. When the connection eventually drops, or the thread unblocks, observe: - Master logs: "Fatal error encountered, WAL iterator is discrete, some seq might be lost" - Replica attempts psync, fails with "sequence out of range" - Full resync is triggered The issue is that step 3 can last indefinitely (we observed 44 hours) with no timeout, errors, or warnings logged. ### What did you expect to see? 1. The master should detect when a replica falls too far behind and proactively disconnect it before WAL is exhausted 2. Socket sends to replicas should have a timeout to prevent indefinite blocking 3. Warning logs when replication lag grows significantly 4. When disconnected early (while the sequence is still in WAL), the replica should be able to psync successfully on reconnect instead of requiring a full resync ### What did you see instead? The replication feed thread blocked for 44 hours with no logs or errors: I20260127 22:16:21.006304 2857 replication.cc:115] WAL was rotated, would reopen again [... 44 hours of silence ...] I20260129 18:36:55.603111 2857 replication.cc:115] WAL was rotated, would reopen again E20260129 18:36:55.646749 2857 replication.cc:126] Fatal error encountered, WAL iterator is discrete, some seq might be lost, sequence 480156205527 expected, but got 481055967952 W20260129 18:36:55.646785 2857 replication.cc:84] Slave thread was terminated The replica then failed to psync ("sequence out of range") and required a full resync. Root cause: In FeedSlaveThread::loop(), the call to util::SockSend() (line 225) blocks indefinitely when the TCP buffer is full. The underlying WriteImpl() has no timeout mechanism. During this blocked period, the master continues writing, and WAL files are pruned, leaving the replica's sequence no longer available. ### Anything Else? I've drafted a possible solution, using Claude code, given that I'm not an expert or dev for C++, that could address this issue with three components: 1. **Socket send timeout**: New `SockSendWithTimeout()` function using poll() with configurable timeout (default 30s) 2. **Replication lag detection**: Check lag at start of each loop iteration, disconnect if it exceeds the configurable threshold (default 100M sequences) 3. **Exponential backoff on reconnection**: Prevents rapid reconnect loops for persistently slow replicas (1s, 2s, 4s... up to 60s) New configuration options: - `max-replication-lag`: Max sequence lag before disconnecting slow consumer - `replication-send-timeout-ms`: Socket send timeout in milliseconds I'm happy to submit the idea PR with the potential fix idea. The changes touch: - src/config/config.h, config.cc (new config options) - src/common/io_util.h, io_util.cc (SockSendWithTimeout) - src/cluster/replication.h, replication.cc (lag detection, timeout usage, backoff) Workaround for affected users: Increase `rocksdb.max_total_wal_size` significantly (e.g., 16GB) to extend WAL retention and reduce the likelihood of exhaustion ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
