Re: [DISCUSSION] CEP-38: CQL Management API

Jon Haddad Sun, 07 Jan 2024 19:43:25 -0800

I like the idea of the ability to execute certain commands via CQL, but I
think it only makes sense for the nodetool commands that cause an action to
take place, such as compact or repair.  We already have virtual tables, I
don't think we need another layer to run informational queries.  I see
little value in having the following (I'm using exec here for simplicity):


cqlsh> exec tpstats

which returns a string in addition to:

cqlsh> select * from system_views.thread_pools

which returns structured data.

I'd also rather see updatable configuration virtual tables instead of

cqlsh> exec setcompactionthroughput 128

Fundamentally, I think it's better for the project if administration is
fully done over CQL and we have a consistent, single way of doing things.
I'm not dead set on it, I just think less is more in a lot of situations,
this being one of them.

Jon


On Wed, Jan 3, 2024 at 2:56 PM Maxim Muzafarov <mmu...@apache.org> wrote:

> Happy New Year to everyone! I'd like to thank everyone for their
> questions, because answering them forces us to move towards the right
> solution, and I also like the ML discussions for the time they give to
> investigate the code :-)
>
> I'm deliberately trying to limit the scope of the initial solution
> (e.g. exclude the agent part) to keep the discussion short and clear,
> but it's also important to have a glimpse of what we can do next once
> we've finished with the topic.
>
> My view of the Command<> is that it is an abstraction in the broader
> sense of an operation that can be performed on the local node,
> involving one of a few internal components. This means that updating a
> property in the settings virtual table via an update statement, or
> executing e.g. the setconcurrentcompactors command are just aliases of
> the same internal command via different APIs. Another example is the
> netstats command, which simply aggregates the MessageService metrics
> and returns them in a human-readable format (just another way of
> looking at key-value metric pairs). More broadly, the command input is
> Map<String, String> and String as the result (or List<String>).
>
> As Abe mentioned, Command and CommandRegistry should be largely based
> on the nodetool command set at the beginning. We have a few options
> for how we can initially construct command metadata during the
> registry implementation (when moving command metadata from the
> nodetool to the core part), so I'm planning to consult with the
> command representations of the k8cassandra project in the way of any
> further registry adoptions have zero problems (by writing a test
> openapi registry exporter and comparing the representation results).
>
> So, the MVP is the following:
> - Command
> - CommandRegistry
> - CQLCommandExporter
> - JMXCommandExporter
> - the nodetool uses the JMXCommandExporter
>
>
> = Answers =
>
> > What do you have in mind specifically there? Do you plan on rewriting a
> brand new implementation which would be partially inspired by our agent? Or
> would the project integrate our agent code in-tree or as a dependency?
>
> Personally, I like the state of the k8ssandra project as it is now. My
> understanding is that the server part of a database always lags behind
> the client and sidecar parts in terms of the jdk version and the
> features it provides. In contrast, sidecars should always be on top of
> the market, so if we want to make an agent part in-tree, this should
> be carefully considered for the flexibility which we may lose, as we
> will not be able to change the agent part within the sidecar. The only
> closest change I can see is that we can remove the interceptor part
> once the CQL command interface is available. I suggest we move the
> agent part to phase 2 and research it. wdyt?
>
>
> > How are the results of the commands expressed to the CQL client? Since
> the command is being treated as CQL, I guess it will be rows, right? If
> yes, some of the nodetool commands output are a bit hierarchical in nature
> (e.g. cfstats, netstats etc...). How are these cases handled?
>
> I think the result of the execution should be a simple string (or set
> of strings), which by its nature matches the nodetool output. I would
> avoid building complex output or output schemas for now to simplify
> the initial changes.
>
>
> > Any changes expected at client/driver side?
>
> I'd like to keep the initial changes to a server part only, to avoid
> scope inflation. For the driver part, I have checked the ExecutionInfo
> interface provided by the java-driver, which should probably be used
> as a command execution status holder. We'd like to have a unique
> command execution id for each command that is executed on the node, so
> the ExecutionInfo should probably hold such an id. Currently it has
> the UUID getTracingId(), which is not well suited for our case and I
> think further changes and follow-ups will be required here (including
> the binary protocol, I think).
>
>
> > The term COMMAND is a bit abstract I feel (subjective)... And I also
> feel the settings part is overlapping with virtual tables.
>
> I think we should keep the term Command as broad as it possible. As
> long as we have a single implementation of a command, and the cost of
> maintaining that piece of the source code is low, it's even better if
> we have a few ways to achieve the same result using different APIs.
> Personally, the only thing I would vote for is the separation of
> command and metric terms (they shouldn't be mixed up).
>
>
> > How are the responses of different operations expressed through the
> Command API? If the Command Registry Adapters depend upon the command
> metadata for invoking/validating the command, then I think there has to be
> a way for them to interpret the response format also, right?
>
> I'm not sure, that I've got the question correctly. Are you talking
> about the command execution result schema and the validation of that
> schema?
>
> For now, I see the interface as follows, the result of the execution
> is a type that can be converted to the same string as the nodetool has
> for the corresponding command (so that the outputs match):
>
> Command<A, R>
> {
>     printResult(A argument, R result, Consumer<String> printer);
> }
>
> On Tue, 5 Dec 2023 at 16:51, Abe Ratnofsky <a...@aber.io> wrote:
> >
> > Adding to Hari's comments:
> >
> > > Any changes expected at client/driver side? While using JMX/nodetool,
> it is clear that the command/operations are getting executed against which
> Cassandra node. But a client can connect to multiple hosts and trigger
> queries, then how can it ensure that commands are executed against the
> desired Cassandra instance?
> >
> > Clients are expected to set the node for the given CQL statement in
> cases like this; see docstring for example:
> https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/api/core/cql/Statement.java#L124-L147
> >
> > > The term COMMAND is a bit abstract I feel (subjective). Some of the
> examples quoted are referring to updating settings (for example: EXECUTE
> COMMAND setconcurrentcompactors WITH concurrent_compactors=5;) and some are
> referring to operations. Updating settings and running operations are
> considerably different things. They may have to be handled in their own
> way. And I also feel the settings part is overlapping with virtual tables.
> If virtual tables support writes (at least the settings virtual table),
> then settings can be updated using the virtual table itself.
> >
> > I agree with this - I actually think it would be clearer if this was
> referred to as nodetool, if the set of commands is going to be largely
> based on nodetool at the beginning. There is a lot of documentation online
> that references nodetool by name, and changing the nomenclature would make
> that existing documentation harder to understand. If a user can understand
> this as "nodetool, but better and over CQL not JMX" I think that's a
> clearer transition than a new concept of "commands".
> >
> > I understand that this proposal includes more than just nodetool, but
> there's a benefit to having a tool with a name, and a web search for
> "cassandra commands" is going to have more competition and ambiguity.
>

Re: [DISCUSSION] CEP-38: CQL Management API

Reply via email to