[
https://issues.apache.org/jira/browse/CASSANDRA-14572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799592#comment-17799592
]
Maxim Muzafarov commented on CASSANDRA-14572:
---------------------------------------------
I have also prepared some benchmarks for the patch, comparing a new iterative
virtual table implementation with the one that we have now.
h3. The _thread_pools_ virtual table implementation changed
As you can see in the patch, I've removed the old implementation of the
_thread_pools_ virtual table and switched its creation to
CollectionVirtualTableAdapter. This change is fully backwards compatible as the
table has the same columns and column types, and it gives us a good opportunity
to benchmark the solution. I ran the following query between the latest stable
4.1.3 release and the patch:
{code:java}
SELECT * FROM system_views.thread_pools;
{code}
The difference between queries to the old and new table is at the level of
measurement error, which I think is good. The difference at the edge of the max
values of _-50%_ can be explained as follows, the iterative approach gives less
GC pressure, which is also good. See the screenshot below:
!thread_pools benchmark.png|width=80%!
h3. A new _keyspaces_group_ virtual table
This new virtual table displays all the metrics which are related to all known
keyspaces in the cluster. This table can be quite large, so I've prepared some
benchmarks for it as well. I've created 1000 keyspaces for the test case. The
_count_ over the _keyspaces_group_ table:
{code:java}
cqlsh> select count(*) from system_metrics.keyspace_group ;
count
-------
77462
(1 rows)
{code}
The query that I used selects a metric by partition:
{code:java}
select * from system_metrics.keyspace_group where name = ?
{code}
I think the benchmark results are quite manageable in this case because we
still have to iterate over the entire set of 77k rows each time an execution
query is made. The internal data collections which we are trying to expose
store their data out of order, which the virtual tables need. In general,
checking a single metric through the large table should be quite fast <1s,
exporting metrics from virtual tables continuously will yield us GC pressure.
!keyspayces_group responses times.png|width=80%!
!keyspayces_group summary.png|width=80%!
> Expose all table metrics in virtual table
> -----------------------------------------
>
> Key: CASSANDRA-14572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14572
> Project: Cassandra
> Issue Type: New Feature
> Components: Legacy/Observability, Observability/Metrics
> Reporter: Chris Lohfink
> Assignee: Maxim Muzafarov
> Priority: Low
> Labels: virtual-tables
> Fix For: 5.x
>
> Attachments: keyspayces_group responses times.png, keyspayces_group
> summary.png, systemv_views.metrics_dropped_message.png, thread_pools
> benchmark.png
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> While we want a number of virtual tables to display data in a way thats great
> and intuitive like in nodetool. There is also much for being able to expose
> the metrics we have for tooling via CQL instead of JMX. This is more for the
> tooling and adhoc advanced users who know exactly what they are looking for.
> *Schema:*
> Initial idea is to expose data via {{((keyspace, table), metric)}} with a
> column for each metric value. Could also use a Map or UDT instead of the
> column based that can be a bit more specific to each metric type. To that end
> there can be a {{metric_type}} column and then a UDT for each metric type
> filled in, or a single value with more of a Map<Text, Text> style. I am
> purposing the column type though as with {{ALLOW FILTERING}} it does allow
> more extensive query capabilities.
> *Implementations:*
> * Use reflection to grab all the metrics from TableMetrics (see:
> CASSANDRA-7622 impl). This is easiest and least abrasive towards new metric
> implementors... but its reflection and a kinda a bad idea.
> * Add a hook in TableMetrics to register with this virtual table when
> registering
> * Pull from the CassandraMetrics registery (either reporter or iterate
> through metrics query on read of virtual table)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]