[ https://issues.apache.org/jira/browse/CASSANDRA-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858245#comment-15858245 ]
Jeff Jirsa commented on CASSANDRA-13038: ---------------------------------------- [~slebresne] - my main objection is that I know people (not me) are using 5 minute TWCS windows, and presumably, they're doing so because they need to expire in 5 minute chunks. Rounding up to 1 hour TTL histograms would eliminate that use case, and it's not a hypothetical use case. There is a lower bound where it becomes much less useful (maybe it's 5 minutes, or 1 minute), but it's not 1 hour. In any case, I'll take the ticket. I had implemented one method using a larger temporary spool of bins that we merge/compact down into the other set on use, and it's about 3-5x faster without further loss of resolution. I'll extend that to add 60 second rounding and see what that does - I suspect it'll be significant, especially in the cases where we're dealing with a day (or week, or month) worth of data inserted with a default TTL. Depending on how that looks, we can decide if 60s or 300s is the right rounding point. > 33% of compaction time spent in StreamingHistogram.update() > ----------------------------------------------------------- > > Key: CASSANDRA-13038 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13038 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Reporter: Corentin Chary > Assignee: Jeff Jirsa > Attachments: compaction-speedup.patch, > compaction-streaminghistrogram.png, profiler-snapshot.nps > > > With the following table, that contains a *lot* of cells: > {code} > CREATE TABLE biggraphite.datapoints_11520p_60s ( > metric uuid, > time_start_ms bigint, > offset smallint, > count int, > value double, > PRIMARY KEY ((metric, time_start_ms), offset) > ) WITH CLUSTERING ORDER BY (offset DESC); > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', > 'compaction_window_size': '6', 'compaction_window_unit': 'HOURS', > 'max_threshold': '32', 'min_threshold': '6'} > Keyspace : biggraphite > Read Count: 1822 > Read Latency: 1.8870054884742042 ms. > Write Count: 2212271647 > Write Latency: 0.027705127678653473 ms. > Pending Flushes: 0 > Table: datapoints_11520p_60s > SSTable count: 47 > Space used (live): 300417555945 > Space used (total): 303147395017 > Space used by snapshots (total): 0 > Off heap memory used (total): 207453042 > SSTable Compression Ratio: 0.4955200053039823 > Number of keys (estimate): 16343723 > Memtable cell count: 220576 > Memtable data size: 17115128 > Memtable off heap memory used: 0 > Memtable switch count: 2872 > Local read count: 0 > Local read latency: NaN ms > Local write count: 1103167888 > Local write latency: 0.025 ms > Pending flushes: 0 > Percent repaired: 0.0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.00000 > Bloom filter space used: 105118296 > Bloom filter off heap memory used: 106547192 > Index summary off heap memory used: 27730962 > Compression metadata off heap memory used: 73174888 > Compacted partition minimum bytes: 61 > Compacted partition maximum bytes: 51012 > Compacted partition mean bytes: 7899 > Average live cells per slice (last five minutes): NaN > Maximum live cells per slice (last five minutes): 0 > Average tombstones per slice (last five minutes): NaN > Maximum tombstones per slice (last five minutes): 0 > Dropped Mutations: 0 > {code} > It looks like a good chunk of the compaction time is lost in > StreamingHistogram.update() (which is used to store the estimated tombstone > drop times). > This could be caused by a huge number of different deletion times which would > makes the bin huge but it this histogram should be capped to 100 keys. It's > more likely caused by the huge number of cells. > A simple solutions could be to only take into accounts part of the cells, the > fact the this table has a TWCS also gives us an additional hint that sampling > deletion times would be fine. -- This message was sent by Atlassian JIRA (v6.3.15#6346)