Re: [DISCUSSION] CEP-38: CQL Management API

Maxim Muzafarov Wed, 03 Jan 2024 09:56:43 -0800

Happy New Year to everyone! I'd like to thank everyone for their
questions, because answering them forces us to move towards the right
solution, and I also like the ML discussions for the time they give to
investigate the code :-)

I'm deliberately trying to limit the scope of the initial solution
(e.g. exclude the agent part) to keep the discussion short and clear,
but it's also important to have a glimpse of what we can do next once
we've finished with the topic.

My view of the Command<> is that it is an abstraction in the broader
sense of an operation that can be performed on the local node,
involving one of a few internal components. This means that updating a
property in the settings virtual table via an update statement, or
executing e.g. the setconcurrentcompactors command are just aliases of
the same internal command via different APIs. Another example is the
netstats command, which simply aggregates the MessageService metrics
and returns them in a human-readable format (just another way of
looking at key-value metric pairs). More broadly, the command input is
Map<String, String> and String as the result (or List<String>).

As Abe mentioned, Command and CommandRegistry should be largely based
on the nodetool command set at the beginning. We have a few options
for how we can initially construct command metadata during the
registry implementation (when moving command metadata from the
nodetool to the core part), so I'm planning to consult with the
command representations of the k8cassandra project in the way of any
further registry adoptions have zero problems (by writing a test
openapi registry exporter and comparing the representation results).

So, the MVP is the following:
- Command
- CommandRegistry
- CQLCommandExporter
- JMXCommandExporter
- the nodetool uses the JMXCommandExporter

= Answers =

> What do you have in mind specifically there? Do you plan on rewriting a brand 
> new implementation which would be partially inspired by our agent? Or would 
> the project integrate our agent code in-tree or as a dependency?

Personally, I like the state of the k8ssandra project as it is now. My
understanding is that the server part of a database always lags behind
the client and sidecar parts in terms of the jdk version and the
features it provides. In contrast, sidecars should always be on top of
the market, so if we want to make an agent part in-tree, this should
be carefully considered for the flexibility which we may lose, as we
will not be able to change the agent part within the sidecar. The only
closest change I can see is that we can remove the interceptor part
once the CQL command interface is available. I suggest we move the
agent part to phase 2 and research it. wdyt?

> How are the results of the commands expressed to the CQL client? Since the 
> command is being treated as CQL, I guess it will be rows, right? If yes, some 
> of the nodetool commands output are a bit hierarchical in nature (e.g. 
> cfstats, netstats etc...). How are these cases handled?

I think the result of the execution should be a simple string (or set
of strings), which by its nature matches the nodetool output. I would
avoid building complex output or output schemas for now to simplify
the initial changes.

> Any changes expected at client/driver side?

I'd like to keep the initial changes to a server part only, to avoid
scope inflation. For the driver part, I have checked the ExecutionInfo
interface provided by the java-driver, which should probably be used
as a command execution status holder. We'd like to have a unique
command execution id for each command that is executed on the node, so
the ExecutionInfo should probably hold such an id. Currently it has
the UUID getTracingId(), which is not well suited for our case and I
think further changes and follow-ups will be required here (including
the binary protocol, I think).

> The term COMMAND is a bit abstract I feel (subjective)... And I also feel the 
> settings part is overlapping with virtual tables.

I think we should keep the term Command as broad as it possible. As
long as we have a single implementation of a command, and the cost of
maintaining that piece of the source code is low, it's even better if
we have a few ways to achieve the same result using different APIs.
Personally, the only thing I would vote for is the separation of
command and metric terms (they shouldn't be mixed up).

> How are the responses of different operations expressed through the Command 
> API? If the Command Registry Adapters depend upon the command metadata for 
> invoking/validating the command, then I think there has to be a way for them 
> to interpret the response format also, right?

I'm not sure, that I've got the question correctly. Are you talking
about the command execution result schema and the validation of that
schema?

For now, I see the interface as follows, the result of the execution
is a type that can be converted to the same string as the nodetool has
for the corresponding command (so that the outputs match):

Command<A, R>
{
    printResult(A argument, R result, Consumer<String> printer);
}

On Tue, 5 Dec 2023 at 16:51, Abe Ratnofsky <a...@aber.io> wrote:
>
> Adding to Hari's comments:
>
> > Any changes expected at client/driver side? While using JMX/nodetool, it is 
> > clear that the command/operations are getting executed against which 
> > Cassandra node. But a client can connect to multiple hosts and trigger 
> > queries, then how can it ensure that commands are executed against the 
> > desired Cassandra instance?
>
> Clients are expected to set the node for the given CQL statement in cases 
> like this; see docstring for example: 
> https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/api/core/cql/Statement.java#L124-L147
>
> > The term COMMAND is a bit abstract I feel (subjective). Some of the 
> > examples quoted are referring to updating settings (for example: EXECUTE 
> > COMMAND setconcurrentcompactors WITH concurrent_compactors=5;) and some are 
> > referring to operations. Updating settings and running operations are 
> > considerably different things. They may have to be handled in their own 
> > way. And I also feel the settings part is overlapping with virtual tables. 
> > If virtual tables support writes (at least the settings virtual table), 
> > then settings can be updated using the virtual table itself.
>
> I agree with this - I actually think it would be clearer if this was referred 
> to as nodetool, if the set of commands is going to be largely based on 
> nodetool at the beginning. There is a lot of documentation online that 
> references nodetool by name, and changing the nomenclature would make that 
> existing documentation harder to understand. If a user can understand this as 
> "nodetool, but better and over CQL not JMX" I think that's a clearer 
> transition than a new concept of "commands".
>
> I understand that this proposal includes more than just nodetool, but there's 
> a benefit to having a tool with a name, and a web search for "cassandra 
> commands" is going to have more competition and ambiguity.

Re: [DISCUSSION] CEP-38: CQL Management API

Reply via email to