[ 
https://issues.apache.org/jira/browse/CASSANDRA-20820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Wang updated CASSANDRA-20820:
----------------------------------
    Attachment: ci_summary.html

> Include Level information for UnifiedCompactionStrategy in nodetool 
> tablestats output
> -------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20820
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20820
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Tool/nodetool
>            Reporter: Andy Tolbert
>            Assignee: Alan Wang
>            Priority: Normal
>         Attachments: ci_summary.html, result_details.tar.gz
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When using {{LeveledCompactionStrategy}} compaction on a table {{tablestats}} 
> currently provides per level data:
> {noformat}
> Keyspace : foo
> ...
>                 Table: bar
>                 ...
>                 SSTables in each level: [6, 20/10, 194/100, 862, 0, 0, 0, 0, 
> 0]
>                 SSTable bytes in each level: [103.91 MiB, 3 GiB, 30.15 GiB, 
> 136.28 GiB, 0 bytes, 0 bytes, 0 bytes, 0 bytes, 0 bytes]
> {noformat}
> This is really useful information as it helps an operator understand whether 
> L0 is getting backed up, and whether higher levels have their expected 10, 
> 100, 1000, etc. targets.
> As {{UnifiedCompactionStrategy}} dynamically places SSTables in levels based 
> on their density, it would also be useful for an operator to know the 
> distribution of their SSTables between levels and stats about SSTables within 
> their levels.
> I have a proof of concept that I'm working on ([slack 
> thread|https://the-asf.slack.com/archives/CJZLTM05A/p1754248321995119]) that 
> adds this information by using UCS {{formLevels}} method to get the 
> distribution of SSTables in their associated levels. The output currently 
> looks like this:
> {noformat}
> SSTables in each level: [0, 6, 15, 165, 3]
> SSTable bytes in each level: [0 bytes, 1.04 GiB, 2.69 GiB, 67.85 GiB, 1.67 
> GiB]
> SSTable Average token space in each level: [0.000, 0.500, 0.083, 0.014, 0.008]
> SSTable Average vs Allowed Max Density Ratio in each level: [0.00, 0.73, 
> 0.36, 0.65, 0.10]
> SSTable Max vs Allowed Max Density Ratio in each level: [0.00, 0.97, 0.99, 
> 1.00, 0.10]
> {noformat}
> This also includes 'average token space per level', which is useful for 
> understanding how much of a token range an SSTable covers on average, which 
> is helpful for ascertaining how much anticompaction may need to be done if 
> incrementally repairing this data.
> Showing the ratio of SSTable densities vs max allowed density in that level 
> helps an operator understand how close they are to accumulating sstables into 
> a new level.
> I would also like to include:
>  * 'Average SSTable size in each level': Given UCS has min and target sstable 
> sizes, its useful for an operator to know how their sstables are being sized, 
> and they should be mostly uniform by level.
>  * 'Shard count in each level': How many shards are assigned to the level. 
> I'm not sure if this is feasible yet, but would be nice to see.
> Some other notes:
>  * When using Incremental Repair, SSTables being divided into repaired and 
> unrepaired sets tends to skew this data for both LCS and UCS. I'd like to 
> separate the metrics out by these repaired sets.
>  * What i'm proposing is adding quite a bit of output to tablestats, so need 
> to evaluate whether we can make this concise enough to include, or if the 
> data should be exposed some other way.
> Given I am still new to UCS, I'll likely iterate a bit on this. Would 
> appreciate feedback/suggestions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to