RE: Survey about the parsing of the tooling's output

Durity, Sean R via user Mon, 10 Jul 2023 08:06:51 -0700

We also parse the output from nodetool info and nodetool status and (to a 
lesser degree) nodetool netstats. We have basically made info and status more 
operator-friendly in a multi-cluster environment. (And we added a useable 
return value to our info command that we can use to evaluate the node’s 
health.) While changes to the output wouldn’t be significantly difficult to 
adapt, there is the cost multiplier of deploying to hundreds of nodes across 
multiple clusters and all the testing and approvals that are required. I would 
agree with “only on major releases” as a rule to follow.


Zero desire to get JSON or YAML outputs – no, thank you. CQL/virtual tables is 
a good, additional goal. Other databases have had this kind of feature for a 
long time.

Sean R. Durity
DB Solutions
Staff Systems Engineer – Cassandra



INTERNAL USE
From: Bowen Song via user <user@cassandra.apache.org>
Sent: Monday, July 10, 2023 7:25 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Survey about the parsing of the tooling's output

We parse the output of the following nodetool sub-commands in our custom 
scripts: status netstats tpstats ring We don't mind the output format change 
between major releases as long as all the following are true: major releases 
are not too frequent


We parse the output of the following nodetool sub-commands in our custom 
scripts:

  *   status
  *   netstats
  *   tpstats
  *   ring

We don't mind the output format change between major releases as long as all 
the following are true:

  1.  major releases are not too frequent
e.g. no more frequent than once every couple of years
  2.  the changes are clearly documented in the CHANGES.txt and mentioned in 
the NEWS.txt
e.g. clearly specify that "someStatistic:" in "nodetool somecommand" is renamed 
to "Some Statistic:"
  3.  the functionality is not lost
e.g. remove a value from the output with no obvious alternative
  4.  it doesn't become a lot harder to parse
e.g. split a value into multiple values with different units, and the new 
values need to be added up together to get the original one

We have Ansible palybooks, shell scripts, Python scripts, etc. parsing the 
output, and to my best knowledge, all of them are trivial to rework for minor 
cosmetic changes like the one given in the example.

Parsing JSON or YAML in vanilla POSIX shell (i.e. without tools such as jq 
installed) can be much harder, we would rather not to have to deal with that. 
For Ansible and Python script, it's a nonissue, but given the fact that we are 
already parsing the default output and it works fine, we are unlikely to change 
them to use JSON or YAML instead, unless the pain of dealing with breaking 
changes is too much and too often.

Querying via CQL is harder, and we would rather not to do that for the reasons 
below:

  *   it requires Cassandra credentials, instead the credential-less nodetool 
command on localhost
  *   for shell scripts, the cqlsh command output is harder to parse than the 
nodetool command, because its output is a human-friendly table with header, 
dynamic indentations, field separators, etc., which makes it a less attractive 
candidate than the nodetool
  *   for Ansible and Python scripts, using the CQL interface will require 
extra modules/libraries. The extra installation steps required make the scripts 
themselves less portable between different servers/environment, so we may still 
prefer the more portable nodetool approach where the localhost access is 
possible


On 10/07/2023 10:35, Miklosovic, Stefan wrote:

Hi Cassandra users,



I am a Cassandra developer and we in Cassandra project would love to know if 
there are users out there for whom the output of the tooling, like, nodetool, 
is important when it comes to parsing it.



We are elaborating on the consequences when nodetool's output for various 
commands is changed - we are not completely sure if users are parsing this 
output in some manner in their custom scripts so us changing the output would 
break their scripts which are parsing it.



Additionally, how big of a problem the output change would be if it was 
happening only between major Cassandra versions? E.g. 4.0 -> 5.0 or 5.0 -> 6.0 
only. In other words, there would be a guarantee that no breaking changes in 
minor versions would ever occur. Only in majors.



Is somebody out there who is relying on the output of some particular nodetool 
commands (or any command in tools/bin) in production? How often do you rely on 
the parsing of nodetool's output and how much work it would be for you to 
rework some minor changes? For example, when the tool output prints 
"someStatistic: 10" and we would rework it to "Some Statistic: 10".



Would you be OK if the output changed but you would have a way how to get e.g. 
JSON or YAML output instead by some flag on nodetool command so it would be 
irrelevant what the default output would be?



It would be appreciated a lot if you gave us more feedback on this. I 
understand that not all questions are relatable to everyone.



Even you are not relying on the output of the tooling in some custom scripts 
where you parse it, please tell us so. We are progressively trying to provide 
CQL way how to query the internal state of Cassandra, via virtual tables, for 
example.



Regards



Stefan Miklosovic

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: Survey about the parsing of the tooling's output

Reply via email to