[ 
https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999621#comment-13999621
 ] 

Chris Lohfink commented on CASSANDRA-7247:
------------------------------------------

Problem is StreamSummary is not thread safe.  There is a 
ConcurrentStreamSummary, which I found in this implementation to be ~5x slower 
then a synchronized block around the offer of the non-thread safe one.  
Concurrent did perform similarly when also wrapped in synchronized block which 
I will show below but because it would lose any benefit of being a concurrent 
implementation when access is serialized I think the faster impl is best.

Done on 2013 retina MBP with 500gb ssd:

{code:title=No Changes}
            id, ops       ,    op/s,   key/s,    mean,     med,     .95,     
.99,    .999,     max,   time,   stderr
 4 threadCount, 634450    ,   21692,   21692,     0.2,     0.2,     0.2,     
0.2,     0.4,   740.1,   29.2,  0.01188
 8 threadCount, 886600    ,   29762,   29762,     0.3,     0.2,     0.3,     
0.4,     1.3,  1007.3,   29.8,  0.01220
16 threadCount, 912050    ,   29035,   29035,     0.5,     0.3,     0.9,     
2.5,    11.2,  1393.8,   31.4,  0.01162
24 threadCount, 1022250   ,   32681,   32681,     0.7,     0.5,     1.0,     
2.9,    13.5,  1126.5,   31.3,  0.00923
36 threadCount, 946550    ,   30900,   30900,     1.2,     0.8,     1.4,     
3.0,    22.5,  1369.2,   30.6,  0.01089
{code}

{code:title=With Patch}
            id, ops       ,    op/s,   key/s,    mean,     med,     .95,     
.99,    .999,     max,   time,   stderr
 4 threadCount, 643900    ,   21700,   21700,     0.2,     0.2,     0.2,     
0.2,     0.9,   941.1,   29.7,  0.01079
 8 threadCount, 942100    ,   32300,   32300,     0.2,     0.2,     0.3,     
0.3,     1.2,   849.5,   29.2,  0.01519
16 threadCount, 907400    ,   30650,   30650,     0.5,     0.3,     0.8,     
1.9,    10.7,  1124.0,   29.6,  0.01112
24 threadCount, 1026150   ,   31753,   31753,     0.7,     0.5,     0.9,     
3.3,    20.6,  1299.0,   32.3,  0.01295
36 threadCount, 980600    ,   30077,   30077,     1.2,     0.8,     1.3,     
2.7,    24.9,  1394.3,   32.6,  0.01747
{code}

> Provide top ten most frequent keys per column family
> ----------------------------------------------------
>
>                 Key: CASSANDRA-7247
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Lohfink
>            Priority: Minor
>         Attachments: patch.diff
>
>
> Since already have the nice addthis stream library, can use it to keep track 
> of most frequent DecoratedKeys that come through the system using 
> StreamSummaries ([nice 
> explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]).
>   Then provide a new metric to access them via JMX.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to