> Fundamentally, I think it's better for the project if administration is fully > done over CQL and we have a consistent, single way of doing things. Strongly agree here. With 2 caveats: 1. Supporting backwards compat, especially for automated ops (i.e. nodetool, JMX, etc), is crucial. Painful, but crucial. 2. We need something that's available for use before the node comes fully online; the point Jeff always brings up when we discuss moving away from JMX. So long as we have some kind of "out-of-band" access to nodes or accommodation for that, we should be good. For context on point 2, see slack: https://the-asf.slack.com/archives/CK23JSY2K/p1688745128122749?thread_ts=1688662169.018449&cid=CK23JSY2K
> I point out that JMX works before and after the native protocol is running > (startup, shutdown, joining, leaving), and also it's semi-common for us to > disable the native protocol in certain circumstances, so at the very least, > we'd then need to implement a totally different cql protocol interface just > for administration, which nobody has committed to building yet. I think this is a solvable problem, and I think the benefits of having a single, elegant way of interacting with a cluster and configuring it justifies the investment for us as a project. Assuming someone has the cycles to, you know, actually do the work. :D On Sun, Jan 7, 2024, at 10:41 PM, Jon Haddad wrote: > I like the idea of the ability to execute certain commands via CQL, but I > think it only makes sense for the nodetool commands that cause an action to > take place, such as compact or repair. We already have virtual tables, I > don't think we need another layer to run informational queries. I see little > value in having the following (I'm using exec here for simplicity): > > cqlsh> exec tpstats > > which returns a string in addition to: > > cqlsh> select * from system_views.thread_pools > > which returns structured data. > > I'd also rather see updatable configuration virtual tables instead of > > cqlsh> exec setcompactionthroughput 128 > > Fundamentally, I think it's better for the project if administration is fully > done over CQL and we have a consistent, single way of doing things. I'm not > dead set on it, I just think less is more in a lot of situations, this being > one of them. > > Jon > > > On Wed, Jan 3, 2024 at 2:56 PM Maxim Muzafarov <mmu...@apache.org> wrote: >> Happy New Year to everyone! I'd like to thank everyone for their >> questions, because answering them forces us to move towards the right >> solution, and I also like the ML discussions for the time they give to >> investigate the code :-) >> >> I'm deliberately trying to limit the scope of the initial solution >> (e.g. exclude the agent part) to keep the discussion short and clear, >> but it's also important to have a glimpse of what we can do next once >> we've finished with the topic. >> >> My view of the Command<> is that it is an abstraction in the broader >> sense of an operation that can be performed on the local node, >> involving one of a few internal components. This means that updating a >> property in the settings virtual table via an update statement, or >> executing e.g. the setconcurrentcompactors command are just aliases of >> the same internal command via different APIs. Another example is the >> netstats command, which simply aggregates the MessageService metrics >> and returns them in a human-readable format (just another way of >> looking at key-value metric pairs). More broadly, the command input is >> Map<String, String> and String as the result (or List<String>). >> >> As Abe mentioned, Command and CommandRegistry should be largely based >> on the nodetool command set at the beginning. We have a few options >> for how we can initially construct command metadata during the >> registry implementation (when moving command metadata from the >> nodetool to the core part), so I'm planning to consult with the >> command representations of the k8cassandra project in the way of any >> further registry adoptions have zero problems (by writing a test >> openapi registry exporter and comparing the representation results). >> >> So, the MVP is the following: >> - Command >> - CommandRegistry >> - CQLCommandExporter >> - JMXCommandExporter >> - the nodetool uses the JMXCommandExporter >> >> >> = Answers = >> >> > What do you have in mind specifically there? Do you plan on rewriting a >> > brand new implementation which would be partially inspired by our agent? >> > Or would the project integrate our agent code in-tree or as a dependency? >> >> Personally, I like the state of the k8ssandra project as it is now. My >> understanding is that the server part of a database always lags behind >> the client and sidecar parts in terms of the jdk version and the >> features it provides. In contrast, sidecars should always be on top of >> the market, so if we want to make an agent part in-tree, this should >> be carefully considered for the flexibility which we may lose, as we >> will not be able to change the agent part within the sidecar. The only >> closest change I can see is that we can remove the interceptor part >> once the CQL command interface is available. I suggest we move the >> agent part to phase 2 and research it. wdyt? >> >> >> > How are the results of the commands expressed to the CQL client? Since the >> > command is being treated as CQL, I guess it will be rows, right? If yes, >> > some of the nodetool commands output are a bit hierarchical in nature >> > (e.g. cfstats, netstats etc...). How are these cases handled? >> >> I think the result of the execution should be a simple string (or set >> of strings), which by its nature matches the nodetool output. I would >> avoid building complex output or output schemas for now to simplify >> the initial changes. >> >> >> > Any changes expected at client/driver side? >> >> I'd like to keep the initial changes to a server part only, to avoid >> scope inflation. For the driver part, I have checked the ExecutionInfo >> interface provided by the java-driver, which should probably be used >> as a command execution status holder. We'd like to have a unique >> command execution id for each command that is executed on the node, so >> the ExecutionInfo should probably hold such an id. Currently it has >> the UUID getTracingId(), which is not well suited for our case and I >> think further changes and follow-ups will be required here (including >> the binary protocol, I think). >> >> >> > The term COMMAND is a bit abstract I feel (subjective)... And I also feel >> > the settings part is overlapping with virtual tables. >> >> I think we should keep the term Command as broad as it possible. As >> long as we have a single implementation of a command, and the cost of >> maintaining that piece of the source code is low, it's even better if >> we have a few ways to achieve the same result using different APIs. >> Personally, the only thing I would vote for is the separation of >> command and metric terms (they shouldn't be mixed up). >> >> >> > How are the responses of different operations expressed through the >> > Command API? If the Command Registry Adapters depend upon the command >> > metadata for invoking/validating the command, then I think there has to be >> > a way for them to interpret the response format also, right? >> >> I'm not sure, that I've got the question correctly. Are you talking >> about the command execution result schema and the validation of that >> schema? >> >> For now, I see the interface as follows, the result of the execution >> is a type that can be converted to the same string as the nodetool has >> for the corresponding command (so that the outputs match): >> >> Command<A, R> >> { >> printResult(A argument, R result, Consumer<String> printer); >> } >> >> On Tue, 5 Dec 2023 at 16:51, Abe Ratnofsky <a...@aber.io> wrote: >> > >> > Adding to Hari's comments: >> > >> > > Any changes expected at client/driver side? While using JMX/nodetool, it >> > > is clear that the command/operations are getting executed against which >> > > Cassandra node. But a client can connect to multiple hosts and trigger >> > > queries, then how can it ensure that commands are executed against the >> > > desired Cassandra instance? >> > >> > Clients are expected to set the node for the given CQL statement in cases >> > like this; see docstring for example: >> > https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/java/com/datastax/oss/driver/api/core/cql/Statement.java#L124-L147 >> > >> > > The term COMMAND is a bit abstract I feel (subjective). Some of the >> > > examples quoted are referring to updating settings (for example: EXECUTE >> > > COMMAND setconcurrentcompactors WITH concurrent_compactors=5;) and some >> > > are referring to operations. Updating settings and running operations >> > > are considerably different things. They may have to be handled in their >> > > own way. And I also feel the settings part is overlapping with virtual >> > > tables. If virtual tables support writes (at least the settings virtual >> > > table), then settings can be updated using the virtual table itself. >> > >> > I agree with this - I actually think it would be clearer if this was >> > referred to as nodetool, if the set of commands is going to be largely >> > based on nodetool at the beginning. There is a lot of documentation online >> > that references nodetool by name, and changing the nomenclature would make >> > that existing documentation harder to understand. If a user can understand >> > this as "nodetool, but better and over CQL not JMX" I think that's a >> > clearer transition than a new concept of "commands". >> > >> > I understand that this proposal includes more than just nodetool, but >> > there's a benefit to having a tool with a name, and a web search for >> > "cassandra commands" is going to have more competition and ambiguity.