[ 
https://issues.apache.org/jira/browse/CASSANDRA-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858245#comment-15858245
 ] 

Jeff Jirsa commented on CASSANDRA-13038:
----------------------------------------

[~slebresne] - my main objection is that I know people (not me) are using 5 
minute TWCS windows, and presumably, they're doing so because they need to 
expire in 5 minute chunks. Rounding up to 1 hour TTL histograms would eliminate 
that use case, and it's not a hypothetical use case. There is a lower bound 
where it becomes much less useful (maybe it's 5 minutes, or 1 minute), but it's 
not 1 hour. 

In any case, I'll take the ticket. I had implemented one method using a larger 
temporary spool of bins that we merge/compact down into the other set on use, 
and it's about 3-5x faster without further loss of resolution. I'll extend that 
to add 60 second rounding and see what that does - I suspect it'll be 
significant, especially in the cases where we're dealing with a day (or week, 
or month) worth of data inserted with a default TTL.  Depending on how that 
looks, we can decide if 60s or 300s is the right rounding point. 

> 33% of compaction time spent in StreamingHistogram.update()
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-13038
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13038
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Compaction
>            Reporter: Corentin Chary
>            Assignee: Jeff Jirsa
>         Attachments: compaction-speedup.patch, 
> compaction-streaminghistrogram.png, profiler-snapshot.nps
>
>
> With the following table, that contains a *lot* of cells: 
> {code}
> CREATE TABLE biggraphite.datapoints_11520p_60s (
>     metric uuid,
>     time_start_ms bigint,
>     offset smallint,
>     count int,
>     value double,
>     PRIMARY KEY ((metric, time_start_ms), offset)
> ) WITH CLUSTERING ORDER BY (offset DESC);
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
> 'compaction_window_size': '6', 'compaction_window_unit': 'HOURS', 
> 'max_threshold': '32', 'min_threshold': '6'}
> Keyspace : biggraphite
>         Read Count: 1822
>         Read Latency: 1.8870054884742042 ms.
>         Write Count: 2212271647
>         Write Latency: 0.027705127678653473 ms.
>         Pending Flushes: 0
>                 Table: datapoints_11520p_60s
>                 SSTable count: 47
>                 Space used (live): 300417555945
>                 Space used (total): 303147395017
>                 Space used by snapshots (total): 0
>                 Off heap memory used (total): 207453042
>                 SSTable Compression Ratio: 0.4955200053039823
>                 Number of keys (estimate): 16343723
>                 Memtable cell count: 220576
>                 Memtable data size: 17115128
>                 Memtable off heap memory used: 0
>                 Memtable switch count: 2872
>                 Local read count: 0
>                 Local read latency: NaN ms
>                 Local write count: 1103167888
>                 Local write latency: 0.025 ms
>                 Pending flushes: 0
>                 Percent repaired: 0.0
>                 Bloom filter false positives: 0
>                 Bloom filter false ratio: 0.00000
>                 Bloom filter space used: 105118296
>                 Bloom filter off heap memory used: 106547192
>                 Index summary off heap memory used: 27730962
>                 Compression metadata off heap memory used: 73174888
>                 Compacted partition minimum bytes: 61
>                 Compacted partition maximum bytes: 51012
>                 Compacted partition mean bytes: 7899
>                 Average live cells per slice (last five minutes): NaN
>                 Maximum live cells per slice (last five minutes): 0
>                 Average tombstones per slice (last five minutes): NaN
>                 Maximum tombstones per slice (last five minutes): 0
>                 Dropped Mutations: 0
> {code}
> It looks like a good chunk of the compaction time is lost in 
> StreamingHistogram.update() (which is used to store the estimated tombstone 
> drop times).
> This could be caused by a huge number of different deletion times which would 
> makes the bin huge but it this histogram should be capped to 100 keys. It's 
> more likely caused by the huge number of cells.
> A simple solutions could be to only take into accounts part of the cells, the 
> fact the this table has a TWCS also gives us an additional hint that sampling 
> deletion times would be fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to