I do miss this from other RDBMSs. If you could come up with a
light-touch way to do this, I think a lot of people would be quite
happy about it.

On Wed, Jan 25, 2017 at 2:02 PM, Corentin Chary
<corentin.ch...@gmail.com> wrote:
> On Wed, Jan 25, 2017 at 9:55 PM, Sam Overton <samover...@gmail.com> wrote:
>> Hello cassandra-dev,
>>
>> I would like to continue the momentum on improving Cassandra's tracing,
>> following Mick's excellent work on pluggable tracing and Zipkin support.
>>
>> There are a couple of areas we can improve that would make tracing an even
>> more
>> useful tool for cluster operators to diagnose ongoing issues.
>>
>> The control we currently have over tracing is coarse and somewhat
>> cumbersome.
>> Enabling tracing from the client for a specific query is fine for
>> application
>> developers, particularly in an environment where Zipkin is being used to
>> trace
>> all parts of the system and show an aggregated view. For an operator
>> investigating an issue however, this does not always give us the control
>> that we
>> need in order to obtain relevant data. We often need to diagnose an issue
>> without the possibility of making any changes in the client, and often
>> without
>> the prior knowledge of which queries at the application level are
>> experiencing
>> poor performance.
>>
>> Our only other instigator of tracing is nodetool settraceprobability which
>> only
>> affects a single node and gives us no control over precisely which queries
>> get
>> traced. In practise, it is very difficult to find the relevant queries that
>> we
>> want to investigate, so we have often resorted to bulk loading the traces
>> into
>> an external tool for analysis, and this seems sub-optimal when cassandra
>> could
>> reduce much of the friction.
>>
>> I have a few proposals to improve tracing that I'd like to throw out to
>> the mailing list to get feedback before I start implementing.
>>
>> 1. Include trace_probability as a CF level property, so sampled tracing can
>> be
>> enabled on a per-CF basis, cluster-wide, by changing the CF property.
>> https://issues.apache.org/jira/browse/CASSANDRA-13154
>>
>> 2. Allow tracing at the CFS level. If we have a misbehaving host, then it
>> would
>> be useful to enable sampled tracing at the CFS layer on just that host so
>> that
>> we can investigate queries landing on that replica, rather than just queries
>> passing through as a coordinator as is currently possible.
>> https://issues.apache.org/jira/browse/CASSANDRA-13155
>>
>> 3. Add an interface allowing for custom filters which can decide whether
>> tracing
>> should be enabled for a given query. This is a similar idea to
>> CASSANDRA-9193
>> [1] but following the same pattern that we have for IAuthenticator,
>> IEndpointSnitch, ConfigurationLoader et al. where the intention is that
>> useful
>> default implementations are provided, but abstracted in such a way that
>> custom
>> implementations can be written for deployments where a specific type of
>> functionality is required. This would then allow solutions such as
>> CASSANDRA-11012 [2] without any specific support needing to be written in
>> Cassandra.
>> https://issues.apache.org/jira/browse/CASSANDRA-13156
>>
>> Thanks for reading!
>> Regards,
>>
>> Sam
>>
>>
>> [1] https://issues.apache.org/jira/browse/CASSANDRA-9193 Facility to write
>> dynamic
>> code to selectively trigger trace or log for queries
>>
>> [2] https://issues.apache.org/jira/browse/CASSANDRA-11012 Allow tracing CQL
>> of a
>> specific client only, based on IP (range)
>
> Not directly related, but to make (3) more useful it would also be
> great to be able to list currently executing queries. I've had
> multiple cases where read queries would just use all my slots and
> never finish and it was quite painful to discover what the query was
> exactly (slow query don't help if the query never finishes).
>
>
> --
> Corentin Chary
> http://xf.iksaif.net

Reply via email to