[
https://issues.apache.org/jira/browse/CASSANDRA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuki Morishita updated CASSANDRA-3442:
--------------------------------------
Attachment: 3442-track-tombstones.txt
Patch attached to track tombstones by creating its drop time histogram.
Size tiered compaction strategy uses this to calculate fraction of droppable
tombstones at compaction and perform single sstable compaction if the fraction
exceeds threshold.
Note that original patch overcounted ExpiringColumn inside SuperColumn. Overall
column count is done at SuperColumn level, so tombstone count should be done at
the same level. Newer patch counts tombstones by simply checking its local
deletion time < Integer.MAX_VALUE.
I also rewrite unit test to simply create one sstable with tombstones and let
it get compacted.
> TTL histogram for sstable metadata
> ----------------------------------
>
> Key: CASSANDRA-3442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3442
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Yuki Morishita
> Priority: Minor
> Labels: compaction
> Fix For: 1.2
>
> Attachments: 3442-track-tombstones.txt, 3442-v3.txt,
> cassandra-1.1-3442.txt
>
>
> Under size-tiered compaction, you can generate large sstables that compact
> infrequently. With expiring columns mixed in, we could waste a lot of space
> in this situation.
> If we kept a TTL EstimatedHistogram in the sstable metadata, we could do a
> single-sstable compaction aginst sstables with over 20% (?) expired data.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira