[ 
https://issues.apache.org/jira/browse/CASSANDRA-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718911#comment-17718911
 ] 

Andres de la Peña edited comment on CASSANDRA-8720 at 5/3/23 12:06 PM:
-----------------------------------------------------------------------

Thanks :)

The data in the table comes from 
[{{EstimatedHistogram}}|https://github.com/apache/cassandra/blob/d2923275e360a1ee9db498e748c269f701bb3a8b/src/java/org/apache/cassandra/utils/EstimatedHistogram.java]s,
 like the ones we use on metrics such as the ones on 
[{{MetadataCollector}}|https://github.com/apache/cassandra/blob/b7e1e44a909c3a1d11e9c387db680c74d31b879f/src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java#L60-L71].
 Histograms do sampling and they don't provide entirely accurate results.

However, it would be easy to track the min and max metrics so they are exact. 
Indeed max seems particularly important if when trying to detect large 
partitions.

I have updated the patch to do that exact calculation of min/max. I have also 
added a tilde (~) prefix to the percentiles coming from the histogram in an 
attempt to indicate that they are approximate. So the output of the tool now 
looks like:
{code:java}
> sstablepartitions 
> data/data/k/t-d7be5e90e90111ed8b54efe3c39cb0bb/nc-8-big-Data.db --min-size 
> 100MiB

Processing  #8 (big-nc) (1.368 GiB uncompressed, 534.979 MiB on disk)
  Partition: '13' (0000000d) live, size: 105.056 MiB, rows: 91490, cells: 
274470, tombstones: 50 (row:50, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '1' (00000001) live, size: 127.241 MiB, rows: 111065, cells: 
333195, tombstones: 50 (row:50, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '8' (00000008) live, size: 356.067 MiB, rows: 310706, cells: 
932118, tombstones: 0 (row:0, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '2' (00000002) live, size: 213.341 MiB, rows: 186582, cells: 
559125, tombstones: 978 (row:978, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
Summary of  #8 (big-nc):
  File: /Users/adelapena/Desktop/sstablepartitions/nc-8-big-Data.db
  4 partitions match
  Keys: 13 1 8 2
               Partition size            Row count           Cell count      
Tombstone count
  ~p50            767.519 KiB                  770                 1916         
           0
  ~p75              2.238 MiB                 2299                 5722         
           0
  ~p90              3.867 MiB                 3311                 9887         
          50
  ~p95             16.629 MiB                14237                42510         
         446
  ~p99            148.267 MiB               126934               379022         
        1331
  ~p999           368.936 MiB               315852               943127         
        2759
  min              56.854 KiB                  100                  150         
           0
  max             356.067 MiB               310706               932118         
        2450
  count                   21
{code}
Note also that the min/max rows on the table don't represent a single 
partition. In the previous example the max size of 356.067 MiB comes from 
partition '8', whereas the max number of tombstones (2450) comes from other 
partition. That partition is not listed because its size is below the 100MiB 
threshold. We can find the key of the partition with 2450 tombstones if we add 
a {{--min-tombstones}} threshold to the command:
{code:java}
> sstablepartitions 
> data/data/k/t-d7be5e90e90111ed8b54efe3c39cb0bb/nc-8-big-Data.db --min-size 
> 100MiB --min-tombstones 2450

Processing  #8 (big-nc) (1.368 GiB uncompressed, 534.979 MiB on disk)
  Partition: '13' (0000000d) live, size: 105.056 MiB, rows: 91490, cells: 
274470, tombstones: 50 (row:50, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '1' (00000001) live, size: 127.241 MiB, rows: 111065, cells: 
333195, tombstones: 50 (row:50, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '8' (00000008) live, size: 356.067 MiB, rows: 310706, cells: 
932118, tombstones: 0 (row:0, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '2' (00000002) live, size: 213.341 MiB, rows: 186582, cells: 
559125, tombstones: 978 (row:978, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '21' (00000015) live, size: 3.853 MiB, rows: 4900, cells: 9927, 
tombstones: 2450 (row:2450, range:0, complex:0, cell:0, row-TTLd:0, cell-TTLd:0)
Summary of  #8 (big-nc):
  File: /Users/adelapena/Desktop/sstablepartitions/nc-8-big-Data.db
  5 partitions match
  Keys: 13 1 8 2 21
               Partition size            Row count           Cell count      
Tombstone count
  ~p50            767.519 KiB                  770                 1916         
           0
  ~p75              2.238 MiB                 2299                 5722         
           0
  ~p90              3.867 MiB                 3311                 9887         
          50
  ~p95             16.629 MiB                14237                42510         
         446
  ~p99            148.267 MiB               126934               379022         
        1331
  ~p999           368.936 MiB               315852               943127         
        2759
  min              56.854 KiB                  100                  150         
           0
  max             356.067 MiB               310706               932118         
        2450
  count                   210
{code}


was (Author: adelapena):
Thanks :)

The data in the table comes from 
[{{EstimatedHistogram}}|https://github.com/apache/cassandra/blob/d2923275e360a1ee9db498e748c269f701bb3a8b/src/java/org/apache/cassandra/utils/EstimatedHistogram.java]s,
 like the ones we use on metrics such as the ones on 
[{{MetadataCollector}}|https://github.com/apache/cassandra/blob/b7e1e44a909c3a1d11e9c387db680c74d31b879f/src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java#L60-L71].
 Histograms do sampling and they don't provide entirely accurate results.

However, it would be easy to track the min and max metrics so they are exact. 
Indeed max seems particularly important if when trying to detect large 
partitions.

I have updated the patch to do that exact calculation of min/max. I have also 
added a tilde (~) prefix to the percentiles coming from the histogram in an 
attempt to indicate that they are approximate. So the output of the tool now 
looks like:
{code:java}
> sstablepartitions 
> data/data/k/t-d7be5e90e90111ed8b54efe3c39cb0bb/nc-8-big-Data.db --min-size 
> 100MiB

Processing  #8 (big-nc) (1.368 GiB uncompressed, 534.979 MiB on disk)
  Partition: '13' (0000000d) live, size: 105.056 MiB, rows: 91490, cells: 
274470, tombstones: 50 (row:50, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '1' (00000001) live, size: 127.241 MiB, rows: 111065, cells: 
333195, tombstones: 50 (row:50, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '8' (00000008) live, size: 356.067 MiB, rows: 310706, cells: 
932118, tombstones: 0 (row:0, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '2' (00000002) live, size: 213.341 MiB, rows: 186582, cells: 
559125, tombstones: 978 (row:978, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
Summary of  #8 (big-nc):
  File: /Users/adelapena/Desktop/sstablepartitions/nc-8-big-Data.db
  4 partitions match
  Keys: 13 1 8 2
               Partition size            Row count           Cell count      
Tombstone count
  ~p50            767.519 KiB                  770                 1916         
           0
  ~p75              2.238 MiB                 2299                 5722         
           0
  ~p90              3.867 MiB                 3311                 9887         
          50
  ~p95             16.629 MiB                14237                42510         
         446
  ~p99            148.267 MiB               126934               379022         
        1331
  ~p999           368.936 MiB               315852               943127         
        2759
  min              56.854 KiB                  100                  150         
           0
  max             356.067 MiB               310706               932118         
        2450
  count                   21
{code}
Note also that the min/max rows on the table don't represent a single 
partition. In the previous example the max size of 356.067 MiB comes from 
partition '8', whereas the max number of tombstones (2450) comes from other 
partition. That partition is not listed because its size is below the 100MiB 
threshold. We can find the key of that partition if we add a `--min-tombstones` 
threshold to the command:
{code:java}
> sstablepartitions 
> data/data/k/t-d7be5e90e90111ed8b54efe3c39cb0bb/nc-8-big-Data.db --min-size 
> 100MiB --min-tombstones 2000

Processing  #8 (big-nc) (1.368 GiB uncompressed, 534.979 MiB on disk)
  Partition: '13' (0000000d) live, size: 105.056 MiB, rows: 91490, cells: 
274470, tombstones: 50 (row:50, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '1' (00000001) live, size: 127.241 MiB, rows: 111065, cells: 
333195, tombstones: 50 (row:50, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '8' (00000008) live, size: 356.067 MiB, rows: 310706, cells: 
932118, tombstones: 0 (row:0, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '2' (00000002) live, size: 213.341 MiB, rows: 186582, cells: 
559125, tombstones: 978 (row:978, range:0, complex:0, cell:0, row-TTLd:0, 
cell-TTLd:0)
  Partition: '21' (00000015) live, size: 3.853 MiB, rows: 4900, cells: 9927, 
tombstones: 2450 (row:2450, range:0, complex:0, cell:0, row-TTLd:0, cell-TTLd:0)
Summary of  #8 (big-nc):
  File: /Users/adelapena/Desktop/sstablepartitions/nc-8-big-Data.db
  5 partitions match
  Keys: 13 1 8 2 21
               Partition size            Row count           Cell count      
Tombstone count
  ~p50            767.519 KiB                  770                 1916         
           0
  ~p75              2.238 MiB                 2299                 5722         
           0
  ~p90              3.867 MiB                 3311                 9887         
          50
  ~p95             16.629 MiB                14237                42510         
         446
  ~p99            148.267 MiB               126934               379022         
        1331
  ~p999           368.936 MiB               315852               943127         
        2759
  min              56.854 KiB                  100                  150         
           0
  max             356.067 MiB               310706               932118         
        2450
  count                   210
{code}

> Provide tools for finding wide row/partition keys
> -------------------------------------------------
>
>                 Key: CASSANDRA-8720
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8720
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Legacy/Tools
>            Reporter: J.B. Langston
>            Assignee: Andres de la Peña
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: 8720.txt
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Multiple users have requested some sort of tool to help identify wide row 
> keys. They get into a situation where they know a wide row/partition has been 
> inserted and it's causing problems for them but they have no idea what the 
> row key is in order to remove it.  
> Maintaining the widest row key currently encountered and displaying it in 
> cfstats would be one possible approach.
> Another would be an offline tool (possibly an enhancement to sstablekeys) to 
> show the number of columns/bytes per key in each sstable. If a tool to 
> aggregate the information at a CF-level could be provided that would be a 
> bonus, but it shouldn't be too hard to write a script wrapper to aggregate 
> them if not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to