[
https://issues.apache.org/jira/browse/CASSANDRA-12201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049671#comment-16049671
]
Pedro Gordo commented on CASSANDRA-12201:
-----------------------------------------
I've squashed several commits, and added everything to a fork from the proper
cassandra repo. You can find it here:
https://github.com/sedulam/cassandra/tree/12201
I believe now you can easily compare my changes to the code base. Please let me
know if this should be done differently.
> Burst Hour Compaction Strategy
> ------------------------------
>
> Key: CASSANDRA-12201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12201
> Project: Cassandra
> Issue Type: New Feature
> Components: Compaction
> Reporter: Pedro Gordo
> Attachments: BHCS outline.pdf
>
> Original Estimate: 1,008h
> Remaining Estimate: 1,008h
>
> This strategy motivation revolves around taking advantage of periods of the
> day where there's less I/O on the cluster. This time of the day will be
> called “Burst Hour” (BH), and hence the strategy will be named “Burst Hour
> Compaction Strategy” (BHCS).
> The following process would be fired during BH:
> 1. Read all the SSTables and detect which partition keys are present in more
> than the compaction minimum threshold value.
> 2. Gather all the tables that have keys present in other tables, with a
> minimum of replicas equal to the minimum compaction threshold.
> 3. Repeat step 2 until the bucket for gathered SSTables reaches the maximum
> compaction threshold (32 by default), or until we've searched all the keys.
> 4. The compaction per se will be done through by MaxSSTableSizeWriter. The
> compacted tables will have a maximum size equal to the configurable value of
> max_sstable_size (100MB by default).
> The maximum compaction task (nodetool compact command), does exactly the same
> operation as the background compaction task, but differing in that it can be
> triggered outside of the Burst Hour.
> This strategy tries to address three issues of the existing compaction
> strategies:
> - Due to max_sstable_size_limit, there's no need to reserve disc space for a
> huge compaction.
> - The number of SSTables that we need to read from to reply to a read query
> will be consistently maintained at a low level and controllable through the
> referenced_sstable_limit property.
> - It removes the dependency of a continuous high I/O.
> Possible future improvements:
> - Continuously evaluate how many pending compactions we have and I/O status,
> and then based on that, we start (or not) the compaction.
> - If during the day, the size for all the SSTables in a family set reaches a
> certain maximum, then background compaction can occur anyway. This maximum
> should be elevated due to the high CPU usage of BHCS.
> - Make it possible to set several compaction times intervals, instead of just
> one.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]