On Sun, Nov 16, 2014 at 5:13 PM, Jimmy Lin <y2klyf+w...@gmail.com> wrote:
> I have read that read repair suppose to be running as background, but > does the co-ordinator node need to wait for the response(along with other > normal read tasks) before return the entire result back to the caller? > For the 10% of requests where read repair is triggered, the coordinator will send a request to every replica. (A data request to two replicas, digest requests to the rest.) Once enough replicas have replied to satisfy the consistency level, the result will be returned to the client; if there's a mismatch in the responses from the replicas, a blocking repair will be performed before responding to the client. Later, in the background, the coordinator will check the remaining responses from replicas to see if they match up. If any of them do not, they will be repaired in the background. > > # > how a high rate of read repair impact performance? I read something that > it will impact through put but not latency, how so? > That's correct, it should impact throughput but not necessarily latency. Throughput is lower because more replicas have to do work, but latency is unaffected (unless you're hitting capacity) because blocking repair only happens under the same conditions that it normally does. > > # > is it safe to even just make read_repair_chance = 0? > (since we are mostly talking to one DC, the other DC most of the time > serve as backup/emergency ) > Sure, it's safe enough. People use read repair for different reasons. Some would say that RR keeps their other datacenter's caches warm. Others rely on it in place of normal repairs (which is not particularly safe, but if your consistency requirements allow for it, it's fine). If you're running regular repairs anyway, it's safe to turn off read repair. -- Tyler Hobbs DataStax <http://datastax.com/>