I do miss this from other RDBMSs. If you could come up with a light-touch way to do this, I think a lot of people would be quite happy about it.
On Wed, Jan 25, 2017 at 2:02 PM, Corentin Chary <corentin.ch...@gmail.com> wrote: > On Wed, Jan 25, 2017 at 9:55 PM, Sam Overton <samover...@gmail.com> wrote: >> Hello cassandra-dev, >> >> I would like to continue the momentum on improving Cassandra's tracing, >> following Mick's excellent work on pluggable tracing and Zipkin support. >> >> There are a couple of areas we can improve that would make tracing an even >> more >> useful tool for cluster operators to diagnose ongoing issues. >> >> The control we currently have over tracing is coarse and somewhat >> cumbersome. >> Enabling tracing from the client for a specific query is fine for >> application >> developers, particularly in an environment where Zipkin is being used to >> trace >> all parts of the system and show an aggregated view. For an operator >> investigating an issue however, this does not always give us the control >> that we >> need in order to obtain relevant data. We often need to diagnose an issue >> without the possibility of making any changes in the client, and often >> without >> the prior knowledge of which queries at the application level are >> experiencing >> poor performance. >> >> Our only other instigator of tracing is nodetool settraceprobability which >> only >> affects a single node and gives us no control over precisely which queries >> get >> traced. In practise, it is very difficult to find the relevant queries that >> we >> want to investigate, so we have often resorted to bulk loading the traces >> into >> an external tool for analysis, and this seems sub-optimal when cassandra >> could >> reduce much of the friction. >> >> I have a few proposals to improve tracing that I'd like to throw out to >> the mailing list to get feedback before I start implementing. >> >> 1. Include trace_probability as a CF level property, so sampled tracing can >> be >> enabled on a per-CF basis, cluster-wide, by changing the CF property. >> https://issues.apache.org/jira/browse/CASSANDRA-13154 >> >> 2. Allow tracing at the CFS level. If we have a misbehaving host, then it >> would >> be useful to enable sampled tracing at the CFS layer on just that host so >> that >> we can investigate queries landing on that replica, rather than just queries >> passing through as a coordinator as is currently possible. >> https://issues.apache.org/jira/browse/CASSANDRA-13155 >> >> 3. Add an interface allowing for custom filters which can decide whether >> tracing >> should be enabled for a given query. This is a similar idea to >> CASSANDRA-9193 >> [1] but following the same pattern that we have for IAuthenticator, >> IEndpointSnitch, ConfigurationLoader et al. where the intention is that >> useful >> default implementations are provided, but abstracted in such a way that >> custom >> implementations can be written for deployments where a specific type of >> functionality is required. This would then allow solutions such as >> CASSANDRA-11012 [2] without any specific support needing to be written in >> Cassandra. >> https://issues.apache.org/jira/browse/CASSANDRA-13156 >> >> Thanks for reading! >> Regards, >> >> Sam >> >> >> [1] https://issues.apache.org/jira/browse/CASSANDRA-9193 Facility to write >> dynamic >> code to selectively trigger trace or log for queries >> >> [2] https://issues.apache.org/jira/browse/CASSANDRA-11012 Allow tracing CQL >> of a >> specific client only, based on IP (range) > > Not directly related, but to make (3) more useful it would also be > great to be able to list currently executing queries. I've had > multiple cases where read queries would just use all my slots and > never finish and it was quite painful to discover what the query was > exactly (slow query don't help if the query never finishes). > > > -- > Corentin Chary > http://xf.iksaif.net