[ https://issues.apache.org/jira/browse/CASSANDRA-20820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Tolbert updated CASSANDRA-20820: ------------------------------------- Description: When using {{LeveledCompactionStrategy}} compaction on a table {{tablestats}} currently provides per level data: {noformat} Keyspace : foo ... Table: bar ... SSTables in each level: [6, 20/10, 194/100, 862, 0, 0, 0, 0, 0] SSTable bytes in each level: [103.91 MiB, 3 GiB, 30.15 GiB, 136.28 GiB, 0 bytes, 0 bytes, 0 bytes, 0 bytes, 0 bytes] {noformat} This is really useful information as it helps an operator understand whether L0 is getting backed up, and whether higher levels have their expected 10, 100, 1000, etc. targets. As {{UnifiedCompactionStrategy}} dynamically places SSTables in levels based on their density, it would also be useful for an operator to know the distribution of their SSTables between levels and stats about SSTables within their levels. I have a proof of concept that I'm working on ([slack thread|https://the-asf.slack.com/archives/CJZLTM05A/p1754248321995119]) that adds this information by using UCS {{formLevels}} method to get the distribution of SSTables in their associated levels. The output currently looks like this: {noformat} SSTables in each level: [0, 6, 15, 165, 3] SSTable bytes in each level: [0 bytes, 1.04 GiB, 2.69 GiB, 67.85 GiB, 1.67 GiB] SSTable Average token space in each level: [0.000, 0.500, 0.083, 0.014, 0.008] SSTable Average vs Allowed Max Density Ratio in each level: [0.00, 0.73, 0.36, 0.65, 0.10] SSTable Max vs Allowed Max Density Ratio in each level: [0.00, 0.97, 0.99, 1.00, 0.10] {noformat} This also includes 'average token space per level', which is useful for understanding how much of a token range an SSTable covers on average, which is helpful for ascertaining how much anticompaction may need to be done if incrementally repairing this data. Showing the ratio of SSTable densities vs max allowed density in that level helps an operator understand how close they are to accumulating sstables into a new level. I would also like to include: * 'Average SSTable size in each level': Given UCS has min and target sstable sizes, its useful for an operator to know how their sstables are being sized, and they should be mostly uniform by level. * 'Shard count in each level': How many shards are assigned to the level. I'm not sure if this is feasible yet, but would be nice to see. Some other notes: * When using Incremental Repair, SSTables being divided into repaired and unrepaired sets tends to skew this data for both LCS and UCS. I'd like to separate the metrics out by these repaired sets. * What i'm proposing is adding quite a bit of output to tablestats, so need to evaluate whether we can make this concise enough to include, or if the data should be exposed some other way. Given I am still new to UCS, I'll likely iterate a bit on this. Would appreciate feedback/suggestions. was: When using {{LeveledCompactionStrategy}} compcation on a table {{tablestats}} currently provides per level data: {noformat} Keyspace : foo ... Table: bar ... SSTables in each level: [6, 20/10, 194/100, 862, 0, 0, 0, 0, 0] SSTable bytes in each level: [103.91 MiB, 3 GiB, 30.15 GiB, 136.28 GiB, 0 bytes, 0 bytes, 0 bytes, 0 bytes, 0 bytes] {noformat} This is really useful information as it helps an operator understand whether L0 is getting backed up, and whether higher levels have their expected 10, 100, 1000, etc. targets. As {{UnifiedCompactionStrategy}} dynamically places SSTables in levels based on their density, it would also be useful for an operator to know the distribution of their SSTables between levels and stats about SSTables within their levels. I have a proof of concept that I'm working on ([slack thread|https://the-asf.slack.com/archives/CJZLTM05A/p1754248321995119]) that adds this information by using UCS {{formLevels}} method to get the distribution of SSTables in their associated levels. The output currently looks like this: {noformat} SSTables in each level: [0, 6, 15, 165, 3] SSTable bytes in each level: [0 bytes, 1.04 GiB, 2.69 GiB, 67.85 GiB, 1.67 GiB] SSTable Average token space in each level: [0.000, 0.500, 0.083, 0.014, 0.008] SSTable Average vs Allowed Max Density Ratio in each level: [0.00, 0.73, 0.36, 0.65, 0.10] SSTable Max vs Allowed Max Density Ratio in each level: [0.00, 0.97, 0.99, 1.00, 0.10] {noformat} This also includes 'average token space per level', which is useful for understanding how much of a token range an SSTable covers on average, which is helpful for ascertaining how much anticompaction may need to be done if incrementally repairing this data. Showing the ratio of SSTable densities vs max allowed density in that level helps an operator understand how close they are to accumulating sstables into a new level. I would also like to include: * 'Average SSTable size in each level': Given UCS has min and target sstable sizes, its useful for an operator to know how their sstables are being sized, and they should be mostly uniform by level. * 'Shard count in each level': How many shards are assigned to the level. I'm not sure if this is feasible yet, but would be nice to see. Some other notes: * When using Incremental Repair, SSTables being divided into repaired and unrepaired sets tends to skew this data for both LCS and UCS. I'd like to separate the metrics out by these repaired sets. * What i'm proposing is adding quite a bit of output to tablestats, so need to evaluate whether we can make this concise enough to include, or if the data should be exposed some other way. Given I am still new to UCS, I'll likely iterate a bit on this. Would appreciate feedback/suggestions. > Include Level information for UnifiedCompactionStrategy in nodetool > tablestats output > ------------------------------------------------------------------------------------- > > Key: CASSANDRA-20820 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20820 > Project: Apache Cassandra > Issue Type: Improvement > Components: Tool/nodetool > Reporter: Andy Tolbert > Assignee: Andy Tolbert > Priority: Normal > > When using {{LeveledCompactionStrategy}} compaction on a table {{tablestats}} > currently provides per level data: > {noformat} > Keyspace : foo > ... > Table: bar > ... > SSTables in each level: [6, 20/10, 194/100, 862, 0, 0, 0, 0, > 0] > SSTable bytes in each level: [103.91 MiB, 3 GiB, 30.15 GiB, > 136.28 GiB, 0 bytes, 0 bytes, 0 bytes, 0 bytes, 0 bytes] > {noformat} > This is really useful information as it helps an operator understand whether > L0 is getting backed up, and whether higher levels have their expected 10, > 100, 1000, etc. targets. > As {{UnifiedCompactionStrategy}} dynamically places SSTables in levels based > on their density, it would also be useful for an operator to know the > distribution of their SSTables between levels and stats about SSTables within > their levels. > I have a proof of concept that I'm working on ([slack > thread|https://the-asf.slack.com/archives/CJZLTM05A/p1754248321995119]) that > adds this information by using UCS {{formLevels}} method to get the > distribution of SSTables in their associated levels. The output currently > looks like this: > {noformat} > SSTables in each level: [0, 6, 15, 165, 3] > SSTable bytes in each level: [0 bytes, 1.04 GiB, 2.69 GiB, 67.85 GiB, 1.67 > GiB] > SSTable Average token space in each level: [0.000, 0.500, 0.083, 0.014, 0.008] > SSTable Average vs Allowed Max Density Ratio in each level: [0.00, 0.73, > 0.36, 0.65, 0.10] > SSTable Max vs Allowed Max Density Ratio in each level: [0.00, 0.97, 0.99, > 1.00, 0.10] > {noformat} > This also includes 'average token space per level', which is useful for > understanding how much of a token range an SSTable covers on average, which > is helpful for ascertaining how much anticompaction may need to be done if > incrementally repairing this data. > Showing the ratio of SSTable densities vs max allowed density in that level > helps an operator understand how close they are to accumulating sstables into > a new level. > I would also like to include: > * 'Average SSTable size in each level': Given UCS has min and target sstable > sizes, its useful for an operator to know how their sstables are being sized, > and they should be mostly uniform by level. > * 'Shard count in each level': How many shards are assigned to the level. > I'm not sure if this is feasible yet, but would be nice to see. > Some other notes: > * When using Incremental Repair, SSTables being divided into repaired and > unrepaired sets tends to skew this data for both LCS and UCS. I'd like to > separate the metrics out by these repaired sets. > * What i'm proposing is adding quite a bit of output to tablestats, so need > to evaluate whether we can make this concise enough to include, or if the > data should be exposed some other way. > Given I am still new to UCS, I'll likely iterate a bit on this. Would > appreciate feedback/suggestions. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org