Re: [DISCUSSION] Framework for Internal Collection Exposure and Monitoring API Alignment

Maxim Muzafarov Tue, 12 Dec 2023 13:05:31 -0800

Hello everyone,

I still think Cassandra will benefit from having this idea implemented
and used through the source code, so I've done another round of
rethinking this concept and it seems I've found a solution. As a
result, we can significantly reduce the cost of implementing and
maintaining both new and existing virtual tables and make our users
happier by seeing everything they need through virtual tables.

So, I think we should limit the scope of the original proposal to the following:
## A framework for exposing any internal data collection to virtual
tables ONLY. ##

As a proof of concept, I took the CASSANDRA-14572 "Expose all table
metrics in virtual table" JIRA ticket, which provides a good
opportunity to demonstrate how to export all metrics to VTs at once
without having boilerplate implementations. Currently, we already have
CQLMetricsTable, BatchMetricsTable, etc. that expose metrics to VTs in
a pretty similar way, and most of the metrics groups are located under
the org.apache.cassandra.metrics package still lacks their
representation as VTs either. I've used the MetricRegistry collection
as a view of registered metrics to export them to VT using the
prototype accordingly.

The prototype is complete. You can run a node locally and check the
available virtual tables with cqlsh, or you can check the changes
using the following link to the PR:
https://github.com/apache/cassandra/pull/2958/files

Below are some key points about the design itself:

1. All new virtual tables with metrics have "metric" as a prefix so
that they are fairly easy to find using TAB on the cqlsh command line.
Metrics are split into virtual tables as they are listed in the
org.apache.cassandra.metrics e.g. metrics_cql, metrics_tcm etc. In
addition, they are also grouped by metric type e.g.
metric_type_histogram, metric_type_counter etc. There is a table
called "metric_all_metric_groups" with all available metric groups.

2. To create a new virtual table representation of an internal
collection a developer needs to do two things: create a virtual table
row representation, and register it using
CollectionVirtualTableAdapter, which acts as an adapter between
internal data and a virtual table. Here's how I did it for the thread
pools VT, this is a fully backward compatible change:
https://github.com/apache/cassandra/pull/2958/files#diff-5fda13a633723cdf232bba465e6fb7ab74cdc02f7ba55dae4d1cf494ffb71abaR61

3. The "metrics_keyspace" virtual table ended up being quite large
since it contains all the metrics for all available keyspaces on a
local node, so the default implementation provided by
AbstractVirtualTable is not suitable for the proposal. The
AbstractVirtualTable materializes a full data collection on the heap
using SimpleDataSet, regardless of the portion of data that is being
queried. In this case, we have to use an iterative approach, as the
CollectionVirtualTableAdapter does (the problem was discussed in
CASSANDRA-14629 and is now a part of the solution). This also helps to
keep the memory footprint low.

4. Another valuable change is the CassandraMetricsRegistry itself. The
problem here is that the metrics and their aliases are currently
exported to JMX, but the implemented virtual tables export the metrics
in their way and most of the cases don't respect the metric aliases
which are registered in the MetricsRegistry. This should be fixed as a
part of the CASSANDRA-14572 to avoid ambiguity for all known metrics
once and for all.

Here are the links to the issue and the PR:
https://issues.apache.org/jira/browse/CASSANDRA-14572
https://github.com/apache/cassandra/pull/2958/files

I'm excited about how these changes look right now, so please share
your feedback and thoughts.
The PR lacks good test coverage, I'll fix it as soon as we have a
clear vision of the design (or much sooner) :-)

On Mon, 30 Jan 2023 at 17:43, David Capwell <dcapw...@apple.com> wrote:
>
> I *think* this is arguably true for a vtable / CQL-based solution as well 
> from the "you don't know how people are using your API" perspective.
>
>
> Very fair point and think that justifies a different thread to talk about 
> backwards compatibility for our tables (virtual and not); maybe we can lump 
> together the JMX backwards compatibility conversation as well in that new 
> thread.
>
> On Jan 28, 2023, at 4:09 PM, Josh McKenzie <jmcken...@apache.org> wrote:
>
> First off - thanks so much for putting in this effort Maxim! This is 
> excellent work.
>
> Some thoughts on the CEP and responses in thread:
>
> Considering that JMX is usually not used and disabled in production 
> environments for various performance and security reasons, the operator may 
> not see the same picture from various of Dropwizard's metrics exporters and 
> integrations as Cassandra's JMX metrics provide [1][2].
>
> I don't think this assertion is true. Cassandra is running in a lot of places 
> in the world, and JMX has been in this ecosystem for a long time; we need 
> data that is basically impossible to get to claim "JMX is usually not used in 
> C* environments in prod".
>
> I also wonder about if we should care about JMX?  I know many wish to migrate 
> (its going to be a very long time) away from JMX, so do we need a wrapper to 
> make JMX and vtables consistent?
>
> If we can move away from a bespoke vtable or JMX based implementation and 
> instead have a templatized solution each of these is generated from, that to 
> me is the superior option. There's little harm in adding new JMX endpoints 
> (or hell, other metrics framework integration?) as a byproduct of adding new 
> vtable exposed metrics; we have the same maintenance obligation to them as we 
> have to the vtables and if it generates from the same base data, we shouldn't 
> have any further maintenance burden due to its presence right?
>
> we wish to move away from JMX
>
> I do, and you do, and many people do, but I don't believe all people on the 
> project do. The last time this came up in slack the conclusion was "Josh 
> should go draft a CEP to chart out a path to moving off JMX while maintaining 
> backwards-compat w/existing JMX metrics for environments that are using them" 
> (so I'm excited to see this CEP pop up before I got to it! ;)). Moving to a 
> system that gives us a 0-cost way to keep JMX and vtable in sync over time on 
> new metrics seems like a nice compromise for folks that have built out 
> JMX-based maintenance infra on top of C*. Plus removing the boilerplate toil 
> on vtables. win-win.
>
> If we add a column to the end of the JMX row did we just break users?
>
> I *think* this is arguably true for a vtable / CQL-based solution as well 
> from the "you don't know how people are using your API" perspective. Unless 
> we have clear guidelines about discretely selecting the columns you want from 
> a vtable and trust users to follow them, if people have brittle greedy 
> parsers pulling in all data from vtables we could very well break them as 
> well by adding a new column right? Could be wrong here; I haven't written 
> anything that consumes vtable metric data and maybe the obvious idiom in the 
> face of that is robust in the presence of column addition. /shrug
>
> It's certainly more flexible and simpler to write to w/out detonating 
> compared to JMX, but it's still an API we'd be revving.
>
> On Sat, Jan 28, 2023, at 4:24 PM, Ekaterina Dimitrova wrote:
>
> Overall I have similar thoughts and questions as David.
>
> I just wanted to add a reminder about this thread from last summer[1]. We 
> already have issues with the alignment of JMX and Settings Virtual Table. I 
> guess this is how Maxim got inspired to suggest this framework proposal which 
> I want to thank him for! (I noticed he assigned CASSANDRA-15254)
>
> Not to open the Pandora box, but to me the most important thing here is to 
> come into agreement about the future of JMX and what we will do or not as a 
> community. Also, how much time people are able to invest. I guess this will 
> influence any directions to be taken here.
>
> [1]
> https://lists.apache.org/thread/8mjcwdyqoobpvw2262bqmskkhs76pp69
>
>
> On Thu, 26 Jan 2023 at 12:41, David Capwell <dcapw...@apple.com> wrote:
>
> I took a look and I see the result is an interface that looks like the vtable 
> interface, that is then used by vtables and JMX?  My first thought is why not 
> just use the vtable logic?
>
> I also wonder about if we should care about JMX?  I know many wish to migrate 
> (its going to be a very long time) away from JMX, so do we need a wrapper to 
> make JMX and vtables consistent?  I am cool with something like the following
>
> registerWithJMX(jmxName, query(“SELECT * FROM system_views.streaming”));
>
>
> So if we want to have a JMX view that matches the table then that’s cool by 
> me, but one thing that has been brought up in reviews is backwards 
> compatibility with regard to adding columns… If we add a column to the end of 
> the JMX row did we just break users?
>
> Considering that JMX is usually not used and disabled in production 
> environments for various performance and security reasons, the operator may 
> not see the same picture from various of Dropwizard's metrics exporters
>
> If this is a real problem people are hitting, we can always add the ability 
> to push metrics to common systems with a pluggable way to add non-standard 
> solutions.  Dropwizard already support this so would be low hanging fruit to 
> address this.
>
> To make the proposed changes backwards compatible with the previous version 
> of Cassandra, all MBeans and Virtual Tables we already have will remain 
> unchanged
>
>
> If this is for new JMX endpoints moving forward, I am not sure of the benefit 
> for the same reason listed above; we wish to move away from JMX
>
> On Jan 25, 2023, at 10:51 AM, Maxim Muzafarov <mmu...@apache.org> wrote:
>
> Hello Cassandra Community,
>
>
> I've been faced with a number of inconsistencies in the user APIs of
> the internal data collections representation exposed through the
> Cassandra monitoring interfaces that need to be fully aligned from an
> operator perspective. First of all, I'm highlighting JMX, Dropwizard
> Metrics, and Virtual Tables user interfaces. In order to address all
> these inconsistencies, I have created a draft enhancement proposal
> that describes everything I have found and how we can fix it once and
> for all.
>
> I'd like to hear your opinion and thoughts on it. Please take a look:
> https://docs.google.com/document/d/1j4J3bPWjQkAU9x4G-zxKObxPrKg36jLRT6xpUoNJa8Q
>
>
> --
> Maxim Muzafarov
>
>

Re: [DISCUSSION] Framework for Internal Collection Exposure and Monitoring API Alignment

Reply via email to