Re: Read Repairs and CL
Thanks Sam - a couple of subtleties there that we missed in our review. Cheers Ben On Tue, 30 Aug 2016 at 19:42 Sam Tunnicliffewrote: > Just to clarify a little further, it's true that read repair queries are > performed at CL ALL, but this is slightly different to a regular, > user-initiated query at that CL. > > Say you have RF=5 and you issue read at CL ALL, the coordinator will send > requests to all 5 replicas and block until it receives a response from each > (or a timeout occurs) before replying to the client. This is the > straightforward and intuitive case. > > If instead you read at CL QUORUM, the # of replicas required for CL is 3, > so the coordinator only contacts 3 nodes. In the case where a speculative > retry is activated, an additional replica is added to the initial set. The > coordinator will still only wait for 3 out of the 4 responses before > proceeding, but if a digest mismatch occurs the read repair queries are > sent to all 4. It's this follow up query that the coordinator executes at > CL ALL, i.e. it requires all 4 replicas to respond to the read repair query > before merging their results to figure out the canonical, latest data. > > You can see that the number of replicas queried/required for read repair > is different than if the client actually requests a read at CL ALL (i.e. > here it's 4, not 5), it's the behaviour of waiting for all *contacted* > replicas to respond which is significant here. > > There are addtional considerations when constructing that initial replica > set (which you can follow in > o.a.c.Service.AbstractReadExecutor::getReadExecutor), involving the table's > read_repair_chance, dclocal_read_repair_chance and speculative_retry > options. THe main gotcha is global read repair (via read_repair_chance) > which will trigger cross-dc repairs at CL ALL in the case of a digest > mismatch, even if the requested CL is DC-local. > > > On Sun, Aug 28, 2016 at 11:55 AM, Ben Slater > wrote: > >> In case anyone else is interested - we figured this out. When C* decides >> it need to do a repair based on a digest mismatch from the initial reads >> for the consistency level it does actually try to do a read at CL=ALL in >> order to get the most up to date data to use to repair. >> >> This led to an interesting issue in our case where we had one node in an >> RF3 cluster down for maintenance (to correct data that became corrupted due >> to a severe write overload) and started getting occasional “timeout during >> read query at consistency LOCAL_QUORUM” failures. We believe this due to >> the case where data for a read was only available on one of the two up >> replicas which then triggered an attempt to repair and a failed read at >> CL=ALL. It seems that CASSANDRA-7947 (a while ago) change the behaviour so >> that C* reports a failure at the originally request level even when it was >> actually the attempted repair read at CL=ALL which could not read >> sufficient replicas - a bit confusing (although I can also see how getting >> CL=ALL errors when you thought you were reading at QUORUM or ONE would be >> confusing). >> >> Cheers >> Ben >> >> On Sun, 28 Aug 2016 at 10:52 kurt Greaves wrote: >> >>> Looking at the wiki for the read path ( >>> http://wiki.apache.org/cassandra/ReadPathForUsers), in the bottom >>> diagram for reading with a read repair, it states the following when >>> "reading from all replica nodes" after there is a hash mismatch: >>> >>> If hashes do not match, do conflict resolution. First step is to read all data from all replica nodes excluding the fastest replica (since CL=ALL) >>> >>> In the bottom left of the diagram it also states: >>> In this example: >>> RF>=2 >>> CL=ALL >>> >>> The (since CL=ALL) implies that the CL for the read during the read >>> repair is based off the CL of the query. However I don't think that makes >>> sense at other CLs. Anyway, I just want to clarify what CL the read for the >>> read repair occurs at for cases where the overall query CL is not ALL. >>> >>> Thanks, >>> Kurt. >>> >>> -- >>> Kurt Greaves >>> k...@instaclustr.com >>> www.instaclustr.com >>> >> -- >> >> Ben Slater >> Chief Product Officer >> Instaclustr: Cassandra + Spark - Managed | Consulting | Support >> +61 437 929 798 >> > > -- Ben Slater Chief Product Officer Instaclustr: Cassandra + Spark - Managed | Consulting | Support +61 437 929 798
Re: Read Repairs and CL
Just to clarify a little further, it's true that read repair queries are performed at CL ALL, but this is slightly different to a regular, user-initiated query at that CL. Say you have RF=5 and you issue read at CL ALL, the coordinator will send requests to all 5 replicas and block until it receives a response from each (or a timeout occurs) before replying to the client. This is the straightforward and intuitive case. If instead you read at CL QUORUM, the # of replicas required for CL is 3, so the coordinator only contacts 3 nodes. In the case where a speculative retry is activated, an additional replica is added to the initial set. The coordinator will still only wait for 3 out of the 4 responses before proceeding, but if a digest mismatch occurs the read repair queries are sent to all 4. It's this follow up query that the coordinator executes at CL ALL, i.e. it requires all 4 replicas to respond to the read repair query before merging their results to figure out the canonical, latest data. You can see that the number of replicas queried/required for read repair is different than if the client actually requests a read at CL ALL (i.e. here it's 4, not 5), it's the behaviour of waiting for all *contacted* replicas to respond which is significant here. There are addtional considerations when constructing that initial replica set (which you can follow in o.a.c.Service.AbstractReadExecutor::getReadExecutor), involving the table's read_repair_chance, dclocal_read_repair_chance and speculative_retry options. THe main gotcha is global read repair (via read_repair_chance) which will trigger cross-dc repairs at CL ALL in the case of a digest mismatch, even if the requested CL is DC-local. On Sun, Aug 28, 2016 at 11:55 AM, Ben Slaterwrote: > In case anyone else is interested - we figured this out. When C* decides > it need to do a repair based on a digest mismatch from the initial reads > for the consistency level it does actually try to do a read at CL=ALL in > order to get the most up to date data to use to repair. > > This led to an interesting issue in our case where we had one node in an > RF3 cluster down for maintenance (to correct data that became corrupted due > to a severe write overload) and started getting occasional “timeout during > read query at consistency LOCAL_QUORUM” failures. We believe this due to > the case where data for a read was only available on one of the two up > replicas which then triggered an attempt to repair and a failed read at > CL=ALL. It seems that CASSANDRA-7947 (a while ago) change the behaviour so > that C* reports a failure at the originally request level even when it was > actually the attempted repair read at CL=ALL which could not read > sufficient replicas - a bit confusing (although I can also see how getting > CL=ALL errors when you thought you were reading at QUORUM or ONE would be > confusing). > > Cheers > Ben > > On Sun, 28 Aug 2016 at 10:52 kurt Greaves wrote: > >> Looking at the wiki for the read path (http://wiki.apache.org/ >> cassandra/ReadPathForUsers), in the bottom diagram for reading with a >> read repair, it states the following when "reading from all replica nodes" >> after there is a hash mismatch: >> >> If hashes do not match, do conflict resolution. First step is to read all >>> data from all replica nodes excluding the fastest replica (since CL=ALL) >>> >> >> In the bottom left of the diagram it also states: >> >>> In this example: >>> >> RF>=2 >>> >> CL=ALL >>> >> >> The (since CL=ALL) implies that the CL for the read during the read >> repair is based off the CL of the query. However I don't think that makes >> sense at other CLs. Anyway, I just want to clarify what CL the read for the >> read repair occurs at for cases where the overall query CL is not ALL. >> >> Thanks, >> Kurt. >> >> -- >> Kurt Greaves >> k...@instaclustr.com >> www.instaclustr.com >> > -- > > Ben Slater > Chief Product Officer > Instaclustr: Cassandra + Spark - Managed | Consulting | Support > +61 437 929 798 >
Re: Read Repairs and CL
In case anyone else is interested - we figured this out. When C* decides it need to do a repair based on a digest mismatch from the initial reads for the consistency level it does actually try to do a read at CL=ALL in order to get the most up to date data to use to repair. This led to an interesting issue in our case where we had one node in an RF3 cluster down for maintenance (to correct data that became corrupted due to a severe write overload) and started getting occasional “timeout during read query at consistency LOCAL_QUORUM” failures. We believe this due to the case where data for a read was only available on one of the two up replicas which then triggered an attempt to repair and a failed read at CL=ALL. It seems that CASSANDRA-7947 (a while ago) change the behaviour so that C* reports a failure at the originally request level even when it was actually the attempted repair read at CL=ALL which could not read sufficient replicas - a bit confusing (although I can also see how getting CL=ALL errors when you thought you were reading at QUORUM or ONE would be confusing). Cheers Ben On Sun, 28 Aug 2016 at 10:52 kurt Greaveswrote: > Looking at the wiki for the read path ( > http://wiki.apache.org/cassandra/ReadPathForUsers), in the bottom diagram > for reading with a read repair, it states the following when "reading from > all replica nodes" after there is a hash mismatch: > > If hashes do not match, do conflict resolution. First step is to read all >> data from all replica nodes excluding the fastest replica (since CL=ALL) >> > > In the bottom left of the diagram it also states: > >> In this example: >> > RF>=2 >> > CL=ALL >> > > The (since CL=ALL) implies that the CL for the read during the read repair > is based off the CL of the query. However I don't think that makes sense at > other CLs. Anyway, I just want to clarify what CL the read for the read > repair occurs at for cases where the overall query CL is not ALL. > > Thanks, > Kurt. > > -- > Kurt Greaves > k...@instaclustr.com > www.instaclustr.com > -- Ben Slater Chief Product Officer Instaclustr: Cassandra + Spark - Managed | Consulting | Support +61 437 929 798