empiredan commented on PR #1399: URL: https://github.com/apache/incubator-pegasus/pull/1399#issuecomment-1477666493
Currently primary replica server respond to client after rocksdb has been written. However, rocksdb write interface may return `kCorruption` or `kIOError`, which will be returned to client and client will think this request has failed. In fact all of primary and secondary logs have been written successfully thus this request should be considered successful. Client will choose to write again and will lead to inconsistency for the non-idempotent writes such as `incr`, `check_and_set` and `check_and_mutate`. To solve this problem, I think we can make rocksdb write asynchronous. Once fail to write rocksdb asynchronously, for example, `kCorruption` or `kIOError`, just remove the replica and move the rocksdb directory to `.err` and move this primary replica to other secondary replica. The consistency will be guaranteed if and only if logs are consistent. We can just write rocksdb asynchrously. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
