[
https://issues.apache.org/jira/browse/CASSANDRA-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718664#comment-17718664
]
Andres de la Peña commented on CASSANDRA-8720:
----------------------------------------------
Here is the {{sstablepartitions}} offline tool originally written by [~snazy]:
||PR||CI||
|[trunk|https://github.com/apache/cassandra/pull/2303]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2877/workflows/e683e442-fa86-4fe5-af6c-6a57665b3a93]
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/2877/workflows/53f41727-77d8-494c-a2d1-ca4b1c71d4eb]|
I have adapted it to the current codebase, fixed a couple of bugs and added
tests for it.
It can be used for finding large partitions in sstables. For example, to find
partitions over 100MiB:
{code}
> sstablepartitions
> data/data/k/t-d7be5e90e90111ed8b54efe3c39cb0bb/nc-8-big-Data.db --min-size
> 100MiB
Processing k.t-d7be5e90e90111ed8b54efe3c39cb0bb #8 (big-nc) (1.368 GiB
uncompressed, 534.979 MiB on disk)
Partition: '13' (0000000d) live, size: 105.056 MiB, rows: 91490, cells:
274470, tombstones: 50 (row:50, range:0, complex:0, cell:0, row-TTLd:0,
cell-TTLd:0)
Partition: '1' (00000001) live, size: 127.241 MiB, rows: 111065, cells:
333195, tombstones: 50 (row:50, range:0, complex:0, cell:0, row-TTLd:0,
cell-TTLd:0)
Partition: '8' (00000008) live, size: 356.067 MiB, rows: 310706, cells:
932118, tombstones: 0 (row:0, range:0, complex:0, cell:0, row-TTLd:0,
cell-TTLd:0)
Partition: '2' (00000002) live, size: 213.341 MiB, rows: 186582, cells:
559125, tombstones: 978 (row:978, range:0, complex:0, cell:0, row-TTLd:0,
cell-TTLd:0)
Summary of k.t-d7be5e90e90111ed8b54efe3c39cb0bb #8 (big-nc):
File:
/Users/adelapena/src/cassandra/trunk/data/data/k/t-d7be5e90e90111ed8b54efe3c39cb0bb/nc-8-big-Data.db
4 partitions match
Keys: 13 1 8 2
Partition size Row count Cell count
Tombstone count
p50 767.519 KiB 770 1916
1
p75 2.238 MiB 2299 5722
1
p90 3.867 MiB 3311 9887
50
p95 16.629 MiB 14237 42510
446
p99 148.267 MiB 126934 379022
1331
p999 368.936 MiB 315852 943127
2759
min 49.817 KiB 87 150
0
max 368.936 MiB 315852 943127
2759
count 210
{code}
> Provide tools for finding wide row/partition keys
> -------------------------------------------------
>
> Key: CASSANDRA-8720
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8720
> Project: Cassandra
> Issue Type: Improvement
> Components: Legacy/Tools
> Reporter: J.B. Langston
> Assignee: Andres de la Peña
> Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
> Attachments: 8720.txt
>
>
> Multiple users have requested some sort of tool to help identify wide row
> keys. They get into a situation where they know a wide row/partition has been
> inserted and it's causing problems for them but they have no idea what the
> row key is in order to remove it.
> Maintaining the widest row key currently encountered and displaying it in
> cfstats would be one possible approach.
> Another would be an offline tool (possibly an enhancement to sstablekeys) to
> show the number of columns/bytes per key in each sstable. If a tool to
> aggregate the information at a CF-level could be provided that would be a
> bonus, but it shouldn't be too hard to write a script wrapper to aggregate
> them if not.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]