[jira] [Commented] (CASSANDRA-7247) Provide top ten most frequent keys per column family

Benedict (JIRA) Sun, 21 Sep 2014 01:01:47 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142375#comment-14142375
 ]


Benedict commented on CASSANDRA-7247:
-------------------------------------

It's probably better to construct a lightweight wrapper around the data you're 
using for equality (key bytes / token), with knowledge of _how_ to turn it into 
a string, and to do so only when we're asked for the TopK. It could well be 
worth enabling this on a per-CF / per-KS basis, though, or configuring the size 
of the sample in the yaml. If you have large keys (64K), the structure as it 
stands will take up > 128Mb per key space, or > 64Mb with the adjustment I've 
just suggested. Either way that's non-trivial, especially since we have two of 
them. Admittedly such large keys are not likely to be common.

> Provide top ten most frequent keys per column family
> ----------------------------------------------------
>
>                 Key: CASSANDRA-7247
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Lohfink
>            Assignee: Chris Lohfink
>            Priority: Minor
>         Attachments: cassandra-2.1-7247.txt, jconsole.png, patch.txt
>
>
> Since already have the nice addthis stream library, can use it to keep track 
> of most frequent DecoratedKeys that come through the system using 
> StreamSummaries ([nice 
> explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]).
>   Then provide a new metric to access them via JMX.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7247) Provide top ten most frequent keys per column family

Reply via email to