[
https://issues.apache.org/jira/browse/CASSANDRA-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933610#comment-15933610
]
Ariel Weisberg commented on CASSANDRA-13289:
--------------------------------------------
bq. maybe only instantiate AbstractWriteResponseHandler#responsesAndExpirations
in #setIdealCLResponseHandler(), and thus only create the AtomicInteger when
you know you are actually going to use it.
Sure. There is a part of me that doesn't like having a null field, but then
again the NPE is asserting something important.
bq. if the ideal CL and the requested CL are the same, should we even bother
capturing metrics about it? I'm kinda mixed on it...
bq. what happens if the user mixes non-CAS consistency levels with CAS
consistency levels (or vice versa)? I think the behavior will be correct (we
won't inadvertantly violate paxos semantics), but the semantic difference
between CAS and non-CAS requests might not be meaningful. So perhaps ignore the
idealCl if the CL types are different? wdyt?
So we could try and block people from doing combinations of things that don't
make sense or aren't useful, but what is the penalty if they do? Their system
will continue running they just won't get anything useful for this metric.
When you think about it for the purpose of what this is measuring CAS and
non-CAS are the same. Only the commit will have a write response handler and
SERIAL == QUORUM and LOCAL_SERIAL == LOCAL_QURUM.
I am not sure there are invalid values other then ideal == current. The problem
is that the error occurs at request time not configuration time so I can't
validate the configuration because I don't know what the CL of subsequent
requests will be. I could not count, but the operator asked me to do something
and even if it looks useless maybe they still expect the metrics to be accurate?
I am in favor of providing mechanism and not policy in these cases. I don't
want to throw an error at the request level. I can't validate the
configuration. And I don't want to silently not increment the counters. The
only other viable alternative is maybe a rate limited log warning.
bq. how will timed out message metrics be affected? We create an entry in
MessagingService#callbacks for each peer contacted for an operation (just
talking reads/mutations right now), and say the request CL is satisfied, but
the idealCL doesn't hear back from some nodes. In that case we'll increment the
timeouts, ConnectionMetrics.totalTimeouts.mark(), even though they weren't
explicitly part of the user's request. It might be confusing to users or
operators. I'm not sure how hard it is to code around that, or if it's
worthwhile. If we feel it's not, perhaps we just document it in the yaml that
"you may see higher than usual timeout counts". Thoughts?
This doesn't impact how timeouts are counted or callbacks are registered. All
this does is hook in and maintain a separate set of metrics for ideal CL. It's
operating within the existing callback that is already being registered and
timed out. It doesn't register additional callbacks. There should be no impact
on existing metrics.
> Make it possible to monitor an ideal consistency level separate from actual
> consistency level
> ---------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-13289
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13289
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
> Fix For: 4.0
>
>
> As an operator there are several issues related to multi-datacenter
> replication and consistency you may want to have more information on from
> your production database.
> For instance. If your application writes at LOCAL_QUORUM how often are those
> writes failing to achieve EACH_QUORUM at other data centers. If you failed
> your application over to one of those data centers roughly how inconsistent
> might it be given the number of writes that didn't propagate since the last
> incremental repair?
> You might also want to know roughly what the latency of writes would be if
> you switched to a different consistency level. For instance you are writing
> at LOCAL_QUORUM and want to know what would happen if you switched to
> EACH_QUORUM.
> The proposed change is to allow an ideal_consistency_level to be specified in
> cassandra.yaml as well as get/set via JMX. If no ideal consistency level is
> specified no additional tracking is done.
> if an ideal consistency level is specified then the
> {{AbstractWriteResponesHandler}} will contain a delegate WriteResponseHandler
> that tracks whether the ideal consistency level is met before a write times
> out. It also tracks the latency for achieving the ideal CL of successful
> writes.
> These two metrics would be reported on a per keyspace basis.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)