[jira] [Commented] (CASSANDRA-13289) Make it possible to monitor an ideal consistency level separate from actual consistency level

Ariel Weisberg (JIRA) Mon, 20 Mar 2017 14:33:13 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933610#comment-15933610
 ]


Ariel Weisberg commented on CASSANDRA-13289:
--------------------------------------------

bq. maybe only instantiate AbstractWriteResponseHandler#responsesAndExpirations 
in #setIdealCLResponseHandler(), and thus only create the AtomicInteger when 
you know you are actually going to use it.
Sure. There is a part of me that doesn't like having a null field, but then 
again the NPE is asserting something important.

bq. if the ideal CL and the requested CL are the same, should we even bother 
capturing metrics about it? I'm kinda mixed on it...
bq. what happens if the user mixes non-CAS consistency levels with CAS 
consistency levels (or vice versa)? I think the behavior will be correct (we 
won't inadvertantly violate paxos semantics), but the semantic difference 
between CAS and non-CAS requests might not be meaningful. So perhaps ignore the 
idealCl if the CL types are different? wdyt?

So we could try and block people from doing combinations of things that don't 
make sense or aren't useful, but what is the penalty if they do? Their system 
will continue running they just won't get anything useful for this metric. 

When you think about it for the purpose of what this is measuring CAS and 
non-CAS are the same. Only the commit will have a write response handler and 
SERIAL == QUORUM and LOCAL_SERIAL == LOCAL_QURUM.

I am not sure there are invalid values other then ideal == current. The problem 
is that the error occurs at request time not configuration time so I can't 
validate the configuration because I don't know what the CL of subsequent 
requests will be. I could not count, but the operator asked me to do something 
and even if it looks useless maybe they still expect the metrics to be accurate?

I am in favor of providing mechanism and not policy in these cases. I don't 
want to throw an error at the request level. I can't validate the 
configuration. And I don't want to silently not increment the counters. The 
only other viable alternative is maybe a rate limited log warning.

bq. how will timed out message metrics be affected? We create an entry in 
MessagingService#callbacks for each peer contacted for an operation (just 
talking reads/mutations right now), and say the request CL is satisfied, but 
the idealCL doesn't hear back from some nodes. In that case we'll increment the 
timeouts, ConnectionMetrics.totalTimeouts.mark(), even though they weren't 
explicitly part of the user's request. It might be confusing to users or 
operators. I'm not sure how hard it is to code around that, or if it's 
worthwhile. If we feel it's not, perhaps we just document it in the yaml that 
"you may see higher than usual timeout counts". Thoughts?

This doesn't impact how timeouts are counted or callbacks are registered. All 
this does is hook in and maintain a separate set of metrics for ideal CL. It's 
operating within the existing callback that is already being registered and 
timed out. It doesn't register additional callbacks. There should be no impact 
on existing metrics.

> Make it possible to monitor an ideal consistency level separate from actual 
> consistency level
> ---------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13289
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13289
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 4.0
>
>
> As an operator there are several issues related to multi-datacenter 
> replication and consistency you may want to have more information on from 
> your production database.
> For instance. If your application writes at LOCAL_QUORUM how often are those 
> writes failing to achieve EACH_QUORUM at other data centers. If you failed 
> your application over to one of those data centers roughly how inconsistent 
> might it be given the number of writes that didn't propagate since the last 
> incremental repair?
> You might also want to know roughly what the latency of writes would be if 
> you switched to a different consistency level. For instance you are writing 
> at LOCAL_QUORUM and want to know what would happen if you switched to 
> EACH_QUORUM.
> The proposed change is to allow an ideal_consistency_level to be specified in 
> cassandra.yaml as well as get/set via JMX. If no ideal consistency level is 
> specified no additional tracking is done.
> if an ideal consistency level is specified then the 
> {{AbstractWriteResponesHandler}} will contain a delegate WriteResponseHandler 
> that tracks whether the ideal consistency level is met before a write times 
> out. It also tracks the latency for achieving the ideal CL  of successful 
> writes.
> These two metrics would be reported on a per keyspace basis.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CASSANDRA-13289) Make it possible to monitor an ideal consistency level separate from actual consistency level

Reply via email to