[ 
https://issues.apache.org/jira/browse/CASSANDRA-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318912#comment-17318912
 ] 

Paulo Motta commented on CASSANDRA-16513:
-----------------------------------------

Hi [~Si_d], thanks for your interest on working on this ticket. I updated the 
ticket description to reflect the previous discussion.

Currently you can query a [virtual 
table|https://cassandra.apache.org/doc/latest/new/virtualtables.html] with 
[cqlsh|https://cassandra.apache.org/doc/latest/tools/cqlsh.html]:
{code:java}
cqlsh:system_views> SELECT * FROM sstable_tasks;
keyspace_name | table_name | task_id                              | kind       
| progress | total    | unit
---------------+------------+--------------------------------------+------------+----------+----------+-------
       basic |      wide2 | c3909740-cdf7-11e9-a8ed-0f03de2d9ae1 | compaction | 
60418761 | 70882110 | bytes
       basic |      wide2 | c7556770-cdf7-11e9-a8ed-0f03de2d9ae1 | compaction | 
 2995623 | 40314679 | bytes
{code}
However before virtual tables were a thing, operators could query this 
information more easily with 
[nodetool|https://cassandra.apache.org/doc/latest/tools/nodetool/nodetool.html]:
{code:java}
nodetool compactionstats
pending tasks: 5
          compaction type        keyspace           table       completed       
    total      unit  progress
               Compaction       Keyspace1       Standard1       282310680       
302170540     bytes    93.43%
               Compaction       Keyspace1       Standard1        58457931       
307520780     bytes    19.01%
Active compaction remaining time :   0h00m16s
{code}
The benefit of the first approach is that it's very easy for developers to add 
new virtual tables to cassandra, since you only need to do this on the server 
side and the client can simply query the virtual table with CQLSH.

The benefit of the second approach is that operators are already used to 
discover and query system information via nodetool and the information comes 
nicely formatted for human consumption, potentially in different formats. The 
downside is that every new information that is added on the server requires a 
new client nodetool command to be added.

The idea here is to add a new tool with the best of both worlds: allow 
developers easily add new information that operators can query via virtual 
tables, while providing a simple way for operators to query this data and 
export to different formats (such as JSON or YAML) via a CLI interface.

The good thing about this task is that we can do it very incrementally, we can 
start with implementing "admintool show sstable_tasks" to display the 
information above, and then add new virtual tables to it incrementally. All 
virtual tables should expose the information in tabular, JSON and YAML format 
(depending on the format parameter), but in the future each virtual table can 
optionally implement a new data formatter to pretty-print virtual table data in 
different formats.

I think a simple way to start is to create a simple tool that just fetches and 
displays the contents of the "system.sstable_tasks" table in tabular format.

> Add tool to display or export the contents of a virtual table
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-16513
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16513
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Observability/Metrics, Tool/nodetool
>            Reporter: Paulo Motta
>            Priority: Normal
>              Labels: gsoc2021, mentor
>
> Several virtual tables were recently added, but they're currently only 
> accessible via cqlsh or programmatically. While this is valuable for many use 
> cases, operators are accustomed with the convenience of querying system 
> metrics with a simple nodetool command.
> In addition to that, a relatively common request is to provide nodetool 
> output in different formats (JSON, YAML and even XML) (CASSANDRA-5977, 
> CASSANDRA-12035, CASSANDRA-12486, CASSANDRA-12698, CASSANDRA-12503). However 
> this requires lots of manual labor as each nodetool subcommand needs to be 
> adapted to support new output formats.
> I propose adding a new CLI tool that will consistently print to the standard 
> output the contents of a virtual table. By default the command will print the 
> output in a tabular format similar to cqlsh, but a "--format" parameter can 
> be specified to modify the output to some other format like JSON or YAML.
> It should be possible to add a limit to the amount of rows displayed and 
> filter to display only rows from with specific keys (ie. keyspace or table). 
> The command should be flexible and provide simple hooks for registration and 
> customization of new virtual tables.
> My vision is that this is a path towards deprecating JMX and toward CQL for 
> management, as we move information currently available through JMX to virtual 
> tables (as CASSANDRA-14457 did with compactionstats) and easily expose them 
> in this new tool as more virtual tables are added. Eventually we can also add 
> setters when we start supporting writeable virtual tables.
> I propose calling this tool admintool (naming bikeshedding welcome), for 
> example:
> {noformat}
> admintool help
> admintool <subcommand> <entity>
> Available subcommands and entities are:
> subcommands:
>  - show
>  - set (future)
> entities:
>  - caches
>  - internode_inbound
>  - internode_outbound
>  - settings
>  - sstable_tasks
>  - system_properties
>  - thread_pools
> nodetool show clients --format yaml
> ...
> nodetool show internode_outboud --format json
> ...
> nodetool show sstabletasks --filter keyspace=my_ks --filter table=my_table
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to