I like the idea of the ability to execute certain commands via CQL, but I think it only makes sense for the nodetool commands that cause an action to take place, such as compact or repair. We already have virtual tables, I don't think we need another layer to run informational queries. I see little value in having the following (I'm using exec here for simplicity):
cqlsh> exec tpstats which returns a string in addition to: cqlsh> select * from system_views.thread_pools which returns structured data. I'd also rather see updatable configuration virtual tables instead of cqlsh> exec setcompactionthroughput 128 Fundamentally, I think it's better for the project if administration is fully done over CQL and we have a consistent, single way of doing things. I'm not dead set on it, I just think less is more in a lot of situations, this being one of them. Jon On Wed, Jan 3, 2024 at 2:56 PM Maxim Muzafarov <mmu...@apache.org> wrote: > Happy New Year to everyone! I'd like to thank everyone for their > questions, because answering them forces us to move towards the right > solution, and I also like the ML discussions for the time they give to > investigate the code :-) > > I'm deliberately trying to limit the scope of the initial solution > (e.g. exclude the agent part) to keep the discussion short and clear, > but it's also important to have a glimpse of what we can do next once > we've finished with the topic. > > My view of the Command<> is that it is an abstraction in the broader > sense of an operation that can be performed on the local node, > involving one of a few internal components. This means that updating a > property in the settings virtual table via an update statement, or > executing e.g. the setconcurrentcompactors command are just aliases of > the same internal command via different APIs. Another example is the > netstats command, which simply aggregates the MessageService metrics > and returns them in a human-readable format (just another way of > looking at key-value metric pairs). More broadly, the command input is > Map<String, String> and String as the result (or List<String>). > > As Abe mentioned, Command and CommandRegistry should be largely based > on the nodetool command set at the beginning. We have a few options > for how we can initially construct command metadata during the > registry implementation (when moving command metadata from the > nodetool to the core part), so I'm planning to consult with the > command representations of the k8cassandra project in the way of any > further registry adoptions have zero problems (by writing a test > openapi registry exporter and comparing the representation results). > > So, the MVP is the following: > - Command > - CommandRegistry > - CQLCommandExporter > - JMXCommandExporter > - the nodetool uses the JMXCommandExporter > > > = Answers = > > > What do you have in mind specifically there? Do you plan on rewriting a > brand new implementation which would be partially inspired by our agent? Or > would the project integrate our agent code in-tree or as a dependency? > > Personally, I like the state of the k8ssandra project as it is now. My > understanding is that the server part of a database always lags behind > the client and sidecar parts in terms of the jdk version and the > features it provides. In contrast, sidecars should always be on top of > the market, so if we want to make an agent part in-tree, this should > be carefully considered for the flexibility which we may lose, as we > will not be able to change the agent part within the sidecar. The only > closest change I can see is that we can remove the interceptor part > once the CQL command interface is available. I suggest we move the > agent part to phase 2 and research it. wdyt? > > > > How are the results of the commands expressed to the CQL client? Since > the command is being treated as CQL, I guess it will be rows, right? If > yes, some of the nodetool commands output are a bit hierarchical in nature > (e.g. cfstats, netstats etc...). How are these cases handled? > > I think the result of the execution should be a simple string (or set > of strings), which by its nature matches the nodetool output. I would > avoid building complex output or output schemas for now to simplify > the initial changes. > > > > Any changes expected at client/driver side? > > I'd like to keep the initial changes to a server part only, to avoid > scope inflation. For the driver part, I have checked the ExecutionInfo > interface provided by the java-driver, which should probably be used > as a command execution status holder. We'd like to have a unique > command execution id for each command that is executed on the node, so > the ExecutionInfo should probably hold such an id. Currently it has > the UUID getTracingId(), which is not well suited for our case and I > think further changes and follow-ups will be required here (including > the binary protocol, I think). > > > > The term COMMAND is a bit abstract I feel (subjective)... And I also > feel the settings part is overlapping with virtual tables. > > I think we should keep the term Command as broad as it possible. As > long as we have a single implementation of a command, and the cost of > maintaining that piece of the source code is low, it's even better if > we have a few ways to achieve the same result using different APIs. > Personally, the only thing I would vote for is the separation of > command and metric terms (they shouldn't be mixed up). > > > > How are the responses of different operations expressed through the > Command API? If the Command Registry Adapters depend upon the command > metadata for invoking/validating the command, then I think there has to be > a way for them to interpret the response format also, right? > > I'm not sure, that I've got the question correctly. Are you talking > about the command execution result schema and the validation of that > schema? > > For now, I see the interface as follows, the result of the execution > is a type that can be converted to the same string as the nodetool has > for the corresponding command (so that the outputs match): > > Command<A, R> > { > printResult(A argument, R result, Consumer<String> printer); > } > > On Tue, 5 Dec 2023 at 16:51, Abe Ratnofsky <a...@aber.io> wrote: > > > > Adding to Hari's comments: > > > > > Any changes expected at client/driver side? While using JMX/nodetool, > it is clear that the command/operations are getting executed against which > Cassandra node. But a client can connect to multiple hosts and trigger > queries, then how can it ensure that commands are executed against the > desired Cassandra instance? > > > > Clients are expected to set the node for the given CQL statement in > cases like this; see docstring for example: > https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/api/core/cql/Statement.java#L124-L147 > > > > > The term COMMAND is a bit abstract I feel (subjective). Some of the > examples quoted are referring to updating settings (for example: EXECUTE > COMMAND setconcurrentcompactors WITH concurrent_compactors=5;) and some are > referring to operations. Updating settings and running operations are > considerably different things. They may have to be handled in their own > way. And I also feel the settings part is overlapping with virtual tables. > If virtual tables support writes (at least the settings virtual table), > then settings can be updated using the virtual table itself. > > > > I agree with this - I actually think it would be clearer if this was > referred to as nodetool, if the set of commands is going to be largely > based on nodetool at the beginning. There is a lot of documentation online > that references nodetool by name, and changing the nomenclature would make > that existing documentation harder to understand. If a user can understand > this as "nodetool, but better and over CQL not JMX" I think that's a > clearer transition than a new concept of "commands". > > > > I understand that this proposal includes more than just nodetool, but > there's a benefit to having a tool with a name, and a web search for > "cassandra commands" is going to have more competition and ambiguity. >