[ 
https://issues.apache.org/jira/browse/CASSANDRA-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012319#comment-14012319
 ] 

Sylvain Lebresne commented on CASSANDRA-6887:
---------------------------------------------

I don't think that patch is what we want to do. We have 
dclocal_read_repair_chance exactly for that kind of reasons. If you don't want 
to get any digest sent to a remote DC, then you should set read_repair_chance 
to 0 and configure dclocal_read_repair to whatever you want. Maybe that's 
something that needs documenting, and in fact I'd be in favor of making it a 
default (i.e instead of having read_repair_chance=0.1 and 
dclocal_read_repair_chance=0, to switch to read_repair_chance=0 and 
dclocal_read_repair_chance=0.1. If you have only one DC, then it won't change 
from the current default, and if you have multiple-DC, I can agree that not 
crossing DC boundaries for read repair is a better default. But I'm not in 
favor of removing the option altogether).

That said, the behaviour described above does sound like a bug. Since by 
default read_repair_chance is 0.1, there should be cross-DC digest queries 
every 10 queries or so, and those *should* repair. If that's not the case, we 
should fix it.

> LOCAL_ONE read repair only does local repair, in spite of global digest 
> queries
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6887
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6887
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra 2.0.6, x86-64 ubuntu precise
>            Reporter: Duncan Sands
>            Assignee: Aleksey Yeschenko
>             Fix For: 2.0.9, 2.1 rc1
>
>         Attachments: 6887-2.0.txt
>
>
> I have a cluster spanning two data centres.  Almost all of the writing (and a 
> lot of reading) is done in DC1.  DC2 is used for running the occasional 
> analytics query.  Reads in both data centres use LOCAL_ONE.  Read repair 
> settings are set to the defaults on all column families.
> I had a long network outage between the data centres; it lasted longer than 
> the hints window, so after it was over DC2 didn't have the latest 
> information.  Even after reading data many many times in DC2, the returned 
> data was still out of date: read repair was not correcting it.
> I then investigated using cqlsh in DC2, with tracing on.
> What I saw was:
>   - with consistency ONE, after about 10 read requests a digest request would 
> be sent to many nodes (spanning both data centres), and the data in DC2 would 
> be repaired.
>  - with consistency LOCAL_ONE, after about 10 read requests a digest request 
> would be sent to many nodes (spanning both data centres), but the data in DC2 
> would not be repaired.  This is in spite of digest requests being sent to 
> DC1, as shown by the tracing.
> So it looks like digest requests are being sent to both data centres, but 
> replies from outside the local data centre are ignored when using LOCAL_ONE.
> The same data is being queried all the time in DC1 with consistency 
> LOCAL_ONE, but this didn't result in the data in DC2 being read repaired 
> either.  This is a slightly different case to what I described above: in that 
> case the local node was out of date and the remote node had the latest data, 
> while here it is the other way round.
> It could be argued that you don't want cross data centre read repair when 
> using LOCAL_ONE.  But then why bother sending cross data centre digest 
> requests?  And if only doing local read repair is how it is supposed to work 
> then it would be good to document this somewhere.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to