[ 
https://issues.apache.org/jira/browse/CASSANDRA-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718664#comment-17718664
 ] 

Andres de la Peña commented on CASSANDRA-8720:
----------------------------------------------

Here is the {{sstablepartitions}} offline tool originally written by [~snazy]:

||PR||CI||
|[trunk|https://github.com/apache/cassandra/pull/2303]|[j8|https://app.circleci.com/pipelines/github/adelapena/cassandra/2877/workflows/e683e442-fa86-4fe5-af6c-6a57665b3a93]
 
[j11|https://app.circleci.com/pipelines/github/adelapena/cassandra/2877/workflows/53f41727-77d8-494c-a2d1-ca4b1c71d4eb]|

I have adapted it to the current codebase, fixed a couple of bugs and added 
tests for it.

It can be used for finding large partitions in sstables. For example, to find 
partitions over 100MiB:
{code}
> sstablepartitions 
> data/data/k/t-d7be5e90e90111ed8b54efe3c39cb0bb/nc-8-big-Data.db --min-size 
> 100MiB

Processing k.t-d7be5e90e90111ed8b54efe3c39cb0bb #8 (big-nc) (1.368 GiB 
uncompressed, 534.979 MiB on disk)
  Partition: '13' (0000000d) live, size: 105.056 MiB, rows: 91490, cells: 
274470, tombstones: 50 (row:50, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '1' (00000001) live, size: 127.241 MiB, rows: 111065, cells: 
333195, tombstones: 50 (row:50, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '8' (00000008) live, size: 356.067 MiB, rows: 310706, cells: 
932118, tombstones: 0 (row:0, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '2' (00000002) live, size: 213.341 MiB, rows: 186582, cells: 
559125, tombstones: 978 (row:978, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
Summary of k.t-d7be5e90e90111ed8b54efe3c39cb0bb #8 (big-nc):
  File: 
/Users/adelapena/src/cassandra/trunk/data/data/k/t-d7be5e90e90111ed8b54efe3c39cb0bb/nc-8-big-Data.db
  4 partitions match
  Keys: 13 1 8 2
              Partition size            Row count           Cell count      
Tombstone count
  p50            767.519 KiB                  770                 1916          
          1
  p75              2.238 MiB                 2299                 5722          
          1
  p90              3.867 MiB                 3311                 9887          
         50
  p95             16.629 MiB                14237                42510          
        446
  p99            148.267 MiB               126934               379022          
       1331
  p999           368.936 MiB               315852               943127          
       2759
  min             49.817 KiB                   87                  150          
          0
  max            368.936 MiB               315852               943127          
       2759
  count                  210
{code}
 

> Provide tools for finding wide row/partition keys
> -------------------------------------------------
>
>                 Key: CASSANDRA-8720
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8720
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Legacy/Tools
>            Reporter: J.B. Langston
>            Assignee: Andres de la Peña
>            Priority: Normal
>             Fix For: 3.11.x, 4.0.x
>
>         Attachments: 8720.txt
>
>
> Multiple users have requested some sort of tool to help identify wide row 
> keys. They get into a situation where they know a wide row/partition has been 
> inserted and it's causing problems for them but they have no idea what the 
> row key is in order to remove it.  
> Maintaining the widest row key currently encountered and displaying it in 
> cfstats would be one possible approach.
> Another would be an offline tool (possibly an enhancement to sstablekeys) to 
> show the number of columns/bytes per key in each sstable. If a tool to 
> aggregate the information at a CF-level could be provided that would be a 
> bonus, but it shouldn't be too hard to write a script wrapper to aggregate 
> them if not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to