Re: Read Repairs and CL

2016-08-30 Thread Ben Slater
Thanks Sam - a couple of subtleties there that we missed in our review.

Cheers
Ben

On Tue, 30 Aug 2016 at 19:42 Sam Tunnicliffe  wrote:

> Just to clarify a little further, it's true that read repair queries are
> performed at CL ALL, but this is slightly different to a regular,
> user-initiated query at that CL.
>
> Say you have RF=5 and you issue read at CL ALL, the coordinator will send
> requests to all 5 replicas and block until it receives a response from each
> (or a timeout occurs) before replying to the client. This is the
> straightforward and intuitive case.
>
> If instead you read at CL QUORUM, the # of replicas required for CL is 3,
> so the coordinator only contacts 3 nodes. In the case where a speculative
> retry is activated, an additional replica is added to the initial set. The
> coordinator will still only wait for 3 out of the 4 responses before
> proceeding, but if a digest mismatch occurs the read repair queries are
> sent to all 4. It's this follow up query that the coordinator executes at
> CL ALL, i.e. it requires all 4 replicas to respond to the read repair query
> before merging their results to figure out the canonical, latest data.
>
> You can see that the number of replicas queried/required for read repair
> is different than if the client actually requests a read at CL ALL (i.e.
> here it's 4, not 5), it's the behaviour of waiting for all *contacted*
> replicas to respond which is significant here.
>
> There are addtional considerations when constructing that initial replica
> set (which you can follow in
> o.a.c.Service.AbstractReadExecutor::getReadExecutor), involving the table's
> read_repair_chance, dclocal_read_repair_chance and speculative_retry
> options. THe main gotcha is global read repair (via read_repair_chance)
> which will trigger cross-dc repairs at CL ALL in the case of a digest
> mismatch, even if the requested CL is DC-local.
>
>
> On Sun, Aug 28, 2016 at 11:55 AM, Ben Slater 
> wrote:
>
>> In case anyone else is interested - we figured this out. When C* decides
>> it need to do a repair based on a digest mismatch from the initial reads
>> for the consistency level it does actually try to do a read at CL=ALL in
>> order to get the most up to date data to use to repair.
>>
>> This led to an interesting issue in our case where we had one node in an
>> RF3 cluster down for maintenance (to correct data that became corrupted due
>> to a severe write overload) and started getting occasional “timeout during
>> read query at consistency LOCAL_QUORUM” failures. We believe this due to
>> the case where data for a read was only available on one of the two up
>> replicas which then triggered an attempt to repair and a failed read at
>> CL=ALL. It seems that CASSANDRA-7947 (a while ago) change the behaviour so
>> that C* reports a failure at the originally request level even when it was
>> actually the attempted repair read at CL=ALL which could not read
>> sufficient replicas - a bit confusing (although I can also see how getting
>> CL=ALL errors when you thought you were reading at QUORUM or ONE would be
>> confusing).
>>
>> Cheers
>> Ben
>>
>> On Sun, 28 Aug 2016 at 10:52 kurt Greaves  wrote:
>>
>>> Looking at the wiki for the read path (
>>> http://wiki.apache.org/cassandra/ReadPathForUsers), in the bottom
>>> diagram for reading with a read repair, it states the following when
>>> "reading from all replica nodes" after there is a hash mismatch:
>>>
>>> If hashes do not match, do conflict resolution. First step is to read
 all data from all replica nodes excluding the fastest replica (since 
 CL=ALL)

>>>
>>>  In the bottom left of the diagram it also states:
>>>
 In this example:

>>> RF>=2

>>> CL=ALL

>>>
>>> The (since CL=ALL) implies that the CL for the read during the read
>>> repair is based off the CL of the query. However I don't think that makes
>>> sense at other CLs. Anyway, I just want to clarify what CL the read for the
>>> read repair occurs at for cases where the overall query CL is not ALL.
>>>
>>> Thanks,
>>> Kurt.
>>>
>>> --
>>> Kurt Greaves
>>> k...@instaclustr.com
>>> www.instaclustr.com
>>>
>> --
>> 
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798
>>
>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Re: Read Repairs and CL

2016-08-30 Thread Sam Tunnicliffe
Just to clarify a little further, it's true that read repair queries are
performed at CL ALL, but this is slightly different to a regular,
user-initiated query at that CL.

Say you have RF=5 and you issue read at CL ALL, the coordinator will send
requests to all 5 replicas and block until it receives a response from each
(or a timeout occurs) before replying to the client. This is the
straightforward and intuitive case.

If instead you read at CL QUORUM, the # of replicas required for CL is 3,
so the coordinator only contacts 3 nodes. In the case where a speculative
retry is activated, an additional replica is added to the initial set. The
coordinator will still only wait for 3 out of the 4 responses before
proceeding, but if a digest mismatch occurs the read repair queries are
sent to all 4. It's this follow up query that the coordinator executes at
CL ALL, i.e. it requires all 4 replicas to respond to the read repair query
before merging their results to figure out the canonical, latest data.

You can see that the number of replicas queried/required for read repair is
different than if the client actually requests a read at CL ALL (i.e. here
it's 4, not 5), it's the behaviour of waiting for all *contacted* replicas
to respond which is significant here.

There are addtional considerations when constructing that initial replica
set (which you can follow in
o.a.c.Service.AbstractReadExecutor::getReadExecutor), involving the table's
read_repair_chance, dclocal_read_repair_chance and speculative_retry
options. THe main gotcha is global read repair (via read_repair_chance)
which will trigger cross-dc repairs at CL ALL in the case of a digest
mismatch, even if the requested CL is DC-local.


On Sun, Aug 28, 2016 at 11:55 AM, Ben Slater 
wrote:

> In case anyone else is interested - we figured this out. When C* decides
> it need to do a repair based on a digest mismatch from the initial reads
> for the consistency level it does actually try to do a read at CL=ALL in
> order to get the most up to date data to use to repair.
>
> This led to an interesting issue in our case where we had one node in an
> RF3 cluster down for maintenance (to correct data that became corrupted due
> to a severe write overload) and started getting occasional “timeout during
> read query at consistency LOCAL_QUORUM” failures. We believe this due to
> the case where data for a read was only available on one of the two up
> replicas which then triggered an attempt to repair and a failed read at
> CL=ALL. It seems that CASSANDRA-7947 (a while ago) change the behaviour so
> that C* reports a failure at the originally request level even when it was
> actually the attempted repair read at CL=ALL which could not read
> sufficient replicas - a bit confusing (although I can also see how getting
> CL=ALL errors when you thought you were reading at QUORUM or ONE would be
> confusing).
>
> Cheers
> Ben
>
> On Sun, 28 Aug 2016 at 10:52 kurt Greaves  wrote:
>
>> Looking at the wiki for the read path (http://wiki.apache.org/
>> cassandra/ReadPathForUsers), in the bottom diagram for reading with a
>> read repair, it states the following when "reading from all replica nodes"
>> after there is a hash mismatch:
>>
>> If hashes do not match, do conflict resolution. First step is to read all
>>> data from all replica nodes excluding the fastest replica (since CL=ALL)
>>>
>>
>>  In the bottom left of the diagram it also states:
>>
>>> In this example:
>>>
>> RF>=2
>>>
>> CL=ALL
>>>
>>
>> The (since CL=ALL) implies that the CL for the read during the read
>> repair is based off the CL of the query. However I don't think that makes
>> sense at other CLs. Anyway, I just want to clarify what CL the read for the
>> read repair occurs at for cases where the overall query CL is not ALL.
>>
>> Thanks,
>> Kurt.
>>
>> --
>> Kurt Greaves
>> k...@instaclustr.com
>> www.instaclustr.com
>>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>


Re: Read Repairs and CL

2016-08-28 Thread Ben Slater
In case anyone else is interested - we figured this out. When C* decides it
need to do a repair based on a digest mismatch from the initial reads for
the consistency level it does actually try to do a read at CL=ALL in order
to get the most up to date data to use to repair.

This led to an interesting issue in our case where we had one node in an
RF3 cluster down for maintenance (to correct data that became corrupted due
to a severe write overload) and started getting occasional “timeout during
read query at consistency LOCAL_QUORUM” failures. We believe this due to
the case where data for a read was only available on one of the two up
replicas which then triggered an attempt to repair and a failed read at
CL=ALL. It seems that CASSANDRA-7947 (a while ago) change the behaviour so
that C* reports a failure at the originally request level even when it was
actually the attempted repair read at CL=ALL which could not read
sufficient replicas - a bit confusing (although I can also see how getting
CL=ALL errors when you thought you were reading at QUORUM or ONE would be
confusing).

Cheers
Ben

On Sun, 28 Aug 2016 at 10:52 kurt Greaves  wrote:

> Looking at the wiki for the read path (
> http://wiki.apache.org/cassandra/ReadPathForUsers), in the bottom diagram
> for reading with a read repair, it states the following when "reading from
> all replica nodes" after there is a hash mismatch:
>
> If hashes do not match, do conflict resolution. First step is to read all
>> data from all replica nodes excluding the fastest replica (since CL=ALL)
>>
>
>  In the bottom left of the diagram it also states:
>
>> In this example:
>>
> RF>=2
>>
> CL=ALL
>>
>
> The (since CL=ALL) implies that the CL for the read during the read repair
> is based off the CL of the query. However I don't think that makes sense at
> other CLs. Anyway, I just want to clarify what CL the read for the read
> repair occurs at for cases where the overall query CL is not ALL.
>
> Thanks,
> Kurt.
>
> --
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>
-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798