Re: Survey about the parsing of the tooling's output
We used to have monitoring scripts that parsed the output of nodetool (status, listsnapshots), but these tools have been replaced with using Jolokia (REST interface to JMX) — this is both more powerful and easier to parse for things like monitoring scripts. We *really* would have loved to have JSON output back when we were just starting out, though. These days we only run nodetool manually by humans from a shell; so changes to the output would not be an issue. - Max > On Jul 10, 2023, at 2:35 am, Miklosovic, Stefan > wrote: > > Hi Cassandra users, > > I am a Cassandra developer and we in Cassandra project would love to know if > there are users out there for whom the output of the tooling, like, nodetool, > is important when it comes to parsing it. > > We are elaborating on the consequences when nodetool's output for various > commands is changed - we are not completely sure if users are parsing this > output in some manner in their custom scripts so us changing the output would > break their scripts which are parsing it. > > Additionally, how big of a problem the output change would be if it was > happening only between major Cassandra versions? E.g. 4.0 -> 5.0 or 5.0 -> > 6.0 only. In other words, there would be a guarantee that no breaking changes > in minor versions would ever occur. Only in majors. > > Is somebody out there who is relying on the output of some particular > nodetool commands (or any command in tools/bin) in production? How often do > you rely on the parsing of nodetool's output and how much work it would be > for you to rework some minor changes? For example, when the tool output > prints "someStatistic: 10" and we would rework it to "Some Statistic: 10". > > Would you be OK if the output changed but you would have a way how to get > e.g. JSON or YAML output instead by some flag on nodetool command so it would > be irrelevant what the default output would be? > > It would be appreciated a lot if you gave us more feedback on this. I > understand that not all questions are relatable to everyone. > > Even you are not relying on the output of the tooling in some custom scripts > where you parse it, please tell us so. We are progressively trying to provide > CQL way how to query the internal state of Cassandra, via virtual tables, for > example. > > Regards > > Stefan Miklosovic
Re: Survey about the parsing of the tooling's output
I am sorry, this is the correct link https://lists.apache.org/thread/72j5qfgbttjcmylhcmfq1ptboh641ns0 From: Miklosovic, Stefan Sent: Wednesday, July 12, 2023 0:08 To: user@cassandra.apache.org Subject: Re: Survey about the parsing of the tooling's output NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. Thank you very much for your valuable feedback and insights. There is a thread (1) where we are discussing this as well. We should come up with some decisions, you are welcome to participate / follow the discussion there as well if you wish. (1) https://lists.apache.org/list.html?d...@cassandra.apache.org From: Andrew Weaver Sent: Monday, July 10, 2023 17:37 To: user@cassandra.apache.org Subject: Re: Survey about the parsing of the tooling's output You don't often get email from andrewjwea...@gmail.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. +1 to Bowen Song's feedback for the most part. We have processes that parse output from these nodetool commands: * info * netstats * status * version My opinion is that for anyone running a reasonably sized fleet of Cassandra will have different flavors of automation - some things running on the nodes themselves where nodetool is very handy and some things running outside the cluster where virtual tables accessed via cql are preferred. I propose a rule that within a given major version, additional lines of output are acceptable changes, but changes to the format of existing lines of output are forbidden. I would be inclined to accept JSON or YAML output from nodetool for Ruby/Python/etc scripts, but for bash, the human-readable output is more work-able. On Mon, Jul 10, 2023 at 4:35 AM Miklosovic, Stefan mailto:stefan.mikloso...@netapp.com>> wrote: Hi Cassandra users, I am a Cassandra developer and we in Cassandra project would love to know if there are users out there for whom the output of the tooling, like, nodetool, is important when it comes to parsing it. We are elaborating on the consequences when nodetool's output for various commands is changed - we are not completely sure if users are parsing this output in some manner in their custom scripts so us changing the output would break their scripts which are parsing it. Additionally, how big of a problem the output change would be if it was happening only between major Cassandra versions? E.g. 4.0 -> 5.0 or 5.0 -> 6.0 only. In other words, there would be a guarantee that no breaking changes in minor versions would ever occur. Only in majors. Is somebody out there who is relying on the output of some particular nodetool commands (or any command in tools/bin) in production? How often do you rely on the parsing of nodetool's output and how much work it would be for you to rework some minor changes? For example, when the tool output prints "someStatistic: 10" and we would rework it to "Some Statistic: 10". Would you be OK if the output changed but you would have a way how to get e.g. JSON or YAML output instead by some flag on nodetool command so it would be irrelevant what the default output would be? It would be appreciated a lot if you gave us more feedback on this. I understand that not all questions are relatable to everyone. Even you are not relying on the output of the tooling in some custom scripts where you parse it, please tell us so. We are progressively trying to provide CQL way how to query the internal state of Cassandra, via virtual tables, for example. Regards Stefan Miklosovic -- Andrew Weaver
Re: Survey about the parsing of the tooling's output
Thank you very much for your valuable feedback and insights. There is a thread (1) where we are discussing this as well. We should come up with some decisions, you are welcome to participate / follow the discussion there as well if you wish. (1) https://lists.apache.org/list.html?d...@cassandra.apache.org From: Andrew Weaver Sent: Monday, July 10, 2023 17:37 To: user@cassandra.apache.org Subject: Re: Survey about the parsing of the tooling's output You don't often get email from andrewjwea...@gmail.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. +1 to Bowen Song's feedback for the most part. We have processes that parse output from these nodetool commands: * info * netstats * status * version My opinion is that for anyone running a reasonably sized fleet of Cassandra will have different flavors of automation - some things running on the nodes themselves where nodetool is very handy and some things running outside the cluster where virtual tables accessed via cql are preferred. I propose a rule that within a given major version, additional lines of output are acceptable changes, but changes to the format of existing lines of output are forbidden. I would be inclined to accept JSON or YAML output from nodetool for Ruby/Python/etc scripts, but for bash, the human-readable output is more work-able. On Mon, Jul 10, 2023 at 4:35 AM Miklosovic, Stefan mailto:stefan.mikloso...@netapp.com>> wrote: Hi Cassandra users, I am a Cassandra developer and we in Cassandra project would love to know if there are users out there for whom the output of the tooling, like, nodetool, is important when it comes to parsing it. We are elaborating on the consequences when nodetool's output for various commands is changed - we are not completely sure if users are parsing this output in some manner in their custom scripts so us changing the output would break their scripts which are parsing it. Additionally, how big of a problem the output change would be if it was happening only between major Cassandra versions? E.g. 4.0 -> 5.0 or 5.0 -> 6.0 only. In other words, there would be a guarantee that no breaking changes in minor versions would ever occur. Only in majors. Is somebody out there who is relying on the output of some particular nodetool commands (or any command in tools/bin) in production? How often do you rely on the parsing of nodetool's output and how much work it would be for you to rework some minor changes? For example, when the tool output prints "someStatistic: 10" and we would rework it to "Some Statistic: 10". Would you be OK if the output changed but you would have a way how to get e.g. JSON or YAML output instead by some flag on nodetool command so it would be irrelevant what the default output would be? It would be appreciated a lot if you gave us more feedback on this. I understand that not all questions are relatable to everyone. Even you are not relying on the output of the tooling in some custom scripts where you parse it, please tell us so. We are progressively trying to provide CQL way how to query the internal state of Cassandra, via virtual tables, for example. Regards Stefan Miklosovic -- Andrew Weaver
Re: Survey about the parsing of the tooling's output
+1 to Bowen Song's feedback for the most part. We have processes that parse output from these nodetool commands: - info - netstats - status - version My opinion is that for anyone running a reasonably sized fleet of Cassandra will have different flavors of automation - some things running on the nodes themselves where nodetool is very handy and some things running outside the cluster where virtual tables accessed via cql are preferred. I propose a rule that within a given major version, additional lines of output are acceptable changes, but changes to the format of existing lines of output are forbidden. I would be inclined to accept JSON or YAML output from nodetool for Ruby/Python/etc scripts, but for bash, the human-readable output is more work-able. On Mon, Jul 10, 2023 at 4:35 AM Miklosovic, Stefan < stefan.mikloso...@netapp.com> wrote: > Hi Cassandra users, > > I am a Cassandra developer and we in Cassandra project would love to know > if there are users out there for whom the output of the tooling, like, > nodetool, is important when it comes to parsing it. > > We are elaborating on the consequences when nodetool's output for various > commands is changed - we are not completely sure if users are parsing this > output in some manner in their custom scripts so us changing the output > would break their scripts which are parsing it. > > Additionally, how big of a problem the output change would be if it was > happening only between major Cassandra versions? E.g. 4.0 -> 5.0 or 5.0 -> > 6.0 only. In other words, there would be a guarantee that no breaking > changes in minor versions would ever occur. Only in majors. > > Is somebody out there who is relying on the output of some particular > nodetool commands (or any command in tools/bin) in production? How often do > you rely on the parsing of nodetool's output and how much work it would be > for you to rework some minor changes? For example, when the tool output > prints "someStatistic: 10" and we would rework it to "Some Statistic: 10". > > Would you be OK if the output changed but you would have a way how to get > e.g. JSON or YAML output instead by some flag on nodetool command so it > would be irrelevant what the default output would be? > > It would be appreciated a lot if you gave us more feedback on this. I > understand that not all questions are relatable to everyone. > > Even you are not relying on the output of the tooling in some custom > scripts where you parse it, please tell us so. We are progressively trying > to provide CQL way how to query the internal state of Cassandra, via > virtual tables, for example. > > Regards > > Stefan Miklosovic -- Andrew Weaver
RE: Survey about the parsing of the tooling's output
We also parse the output from nodetool info and nodetool status and (to a lesser degree) nodetool netstats. We have basically made info and status more operator-friendly in a multi-cluster environment. (And we added a useable return value to our info command that we can use to evaluate the node’s health.) While changes to the output wouldn’t be significantly difficult to adapt, there is the cost multiplier of deploying to hundreds of nodes across multiple clusters and all the testing and approvals that are required. I would agree with “only on major releases” as a rule to follow. Zero desire to get JSON or YAML outputs – no, thank you. CQL/virtual tables is a good, additional goal. Other databases have had this kind of feature for a long time. Sean R. Durity DB Solutions Staff Systems Engineer – Cassandra INTERNAL USE From: Bowen Song via user Sent: Monday, July 10, 2023 7:25 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Survey about the parsing of the tooling's output We parse the output of the following nodetool sub-commands in our custom scripts: status netstats tpstats ring We don't mind the output format change between major releases as long as all the following are true: major releases are not too frequent We parse the output of the following nodetool sub-commands in our custom scripts: * status * netstats * tpstats * ring We don't mind the output format change between major releases as long as all the following are true: 1. major releases are not too frequent e.g. no more frequent than once every couple of years 2. the changes are clearly documented in the CHANGES.txt and mentioned in the NEWS.txt e.g. clearly specify that "someStatistic:" in "nodetool somecommand" is renamed to "Some Statistic:" 3. the functionality is not lost e.g. remove a value from the output with no obvious alternative 4. it doesn't become a lot harder to parse e.g. split a value into multiple values with different units, and the new values need to be added up together to get the original one We have Ansible palybooks, shell scripts, Python scripts, etc. parsing the output, and to my best knowledge, all of them are trivial to rework for minor cosmetic changes like the one given in the example. Parsing JSON or YAML in vanilla POSIX shell (i.e. without tools such as jq installed) can be much harder, we would rather not to have to deal with that. For Ansible and Python script, it's a nonissue, but given the fact that we are already parsing the default output and it works fine, we are unlikely to change them to use JSON or YAML instead, unless the pain of dealing with breaking changes is too much and too often. Querying via CQL is harder, and we would rather not to do that for the reasons below: * it requires Cassandra credentials, instead the credential-less nodetool command on localhost * for shell scripts, the cqlsh command output is harder to parse than the nodetool command, because its output is a human-friendly table with header, dynamic indentations, field separators, etc., which makes it a less attractive candidate than the nodetool * for Ansible and Python scripts, using the CQL interface will require extra modules/libraries. The extra installation steps required make the scripts themselves less portable between different servers/environment, so we may still prefer the more portable nodetool approach where the localhost access is possible On 10/07/2023 10:35, Miklosovic, Stefan wrote: Hi Cassandra users, I am a Cassandra developer and we in Cassandra project would love to know if there are users out there for whom the output of the tooling, like, nodetool, is important when it comes to parsing it. We are elaborating on the consequences when nodetool's output for various commands is changed - we are not completely sure if users are parsing this output in some manner in their custom scripts so us changing the output would break their scripts which are parsing it. Additionally, how big of a problem the output change would be if it was happening only between major Cassandra versions? E.g. 4.0 -> 5.0 or 5.0 -> 6.0 only. In other words, there would be a guarantee that no breaking changes in minor versions would ever occur. Only in majors. Is somebody out there who is relying on the output of some particular nodetool commands (or any command in tools/bin) in production? How often do you rely on the parsing of nodetool's output and how much work it would be for you to rework some minor changes? For example, when the tool output prints "someStatistic: 10" and we would rework it to "Some Statistic: 10". Would you be OK if the output changed but you would have a way how to get e.g. JSON or YAML output instead by some flag on nodetool command so it would be irrelevant what the default output would be? It would be appreciated a lot if you g
Re: Survey about the parsing of the tooling's output
We parse the output of the following nodetool sub-commands in our custom scripts: * status * netstats * tpstats * ring We don't mind the output format change between major releases as long as all the following are true: 1. major releases are not too frequent e.g. no more frequent than once every couple of years 2. the changes are clearly documented in the CHANGES.txt and mentioned in the NEWS.txt e.g. clearly specify that "someStatistic:" in "nodetool somecommand" is renamed to "Some Statistic:" 3. the functionality is not lost e.g. remove a value from the output with no obvious alternative 4. it doesn't become a lot harder to parse e.g. split a value into multiple values with different units, and the new values need to be added up together to get the original one We have Ansible palybooks, shell scripts, Python scripts, etc. parsing the output, and to my best knowledge, all of them are trivial to rework for minor cosmetic changes like the one given in the example. Parsing JSON or YAML in vanilla POSIX shell (i.e. without tools such as jq installed) can be much harder, we would rather not to have to deal with that. For Ansible and Python script, it's a nonissue, but given the fact that we are already parsing the default output and it works fine, we are unlikely to change them to use JSON or YAML instead, unless the pain of dealing with breaking changes is too much and too often. Querying via CQL is harder, and we would rather not to do that for the reasons below: * it requires Cassandra credentials, instead the credential-less nodetool command on localhost * for shell scripts, the cqlsh command output is harder to parse than the nodetool command, because its output is a human-friendly table with header, dynamic indentations, field separators, etc., which makes it a less attractive candidate than the nodetool * for Ansible and Python scripts, using the CQL interface will require extra modules/libraries. The extra installation steps required make the scripts themselves less portable between different servers/environment, so we may still prefer the more portable nodetool approach where the localhost access is possible On 10/07/2023 10:35, Miklosovic, Stefan wrote: Hi Cassandra users, I am a Cassandra developer and we in Cassandra project would love to know if there are users out there for whom the output of the tooling, like, nodetool, is important when it comes to parsing it. We are elaborating on the consequences when nodetool's output for various commands is changed - we are not completely sure if users are parsing this output in some manner in their custom scripts so us changing the output would break their scripts which are parsing it. Additionally, how big of a problem the output change would be if it was happening only between major Cassandra versions? E.g. 4.0 -> 5.0 or 5.0 -> 6.0 only. In other words, there would be a guarantee that no breaking changes in minor versions would ever occur. Only in majors. Is somebody out there who is relying on the output of some particular nodetool commands (or any command in tools/bin) in production? How often do you rely on the parsing of nodetool's output and how much work it would be for you to rework some minor changes? For example, when the tool output prints "someStatistic: 10" and we would rework it to "Some Statistic: 10". Would you be OK if the output changed but you would have a way how to get e.g. JSON or YAML output instead by some flag on nodetool command so it would be irrelevant what the default output would be? It would be appreciated a lot if you gave us more feedback on this. I understand that not all questions are relatable to everyone. Even you are not relying on the output of the tooling in some custom scripts where you parse it, please tell us so. We are progressively trying to provide CQL way how to query the internal state of Cassandra, via virtual tables, for example. Regards Stefan Miklosovic