[
https://issues.apache.org/jira/browse/CASSANDRA-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227003#comment-15227003
]
Marcus Eriksson commented on CASSANDRA-11407:
---------------------------------------------
leaving this open, but i think we might merge CASSANDRA-9666 instead
> Proposal for simplified DTCS
> ----------------------------
>
> Key: CASSANDRA-11407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11407
> Project: Cassandra
> Issue Type: Improvement
> Components: Compaction
> Reporter: Anubhav Kale
> Labels: dtcs
> Attachments: 0001-Simple-DTCS.patch
>
>
> Today's DTCS implementation has been discussed and debated in a few JIRAs
> already (the notable one is
> https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main
> challenges with the current approach is that it is very difficult to reason
> about how the "Target" class makes buckets, thus making it difficult to
> reason about the expected file layout on disk.
> I am proposing a simplification to current approach that keeps most of the
> DTCS properties intact that makes it a great fit for time-series data. The
> simplification is as follows.
> Given the min and max timestamps across all SS Tables in question, start from
> min and make windows based on base and min_threshold. The logic in GetWindow
> simply tries to fit maximum sized windows from min to max.
> This keeps the DTCS properties intact except that we don't need to wait for
> min_threshold windows before making a bigger one. I would argue this
> simplifies the algorithm to a great extent, is easy to reason about and the
> end result isn't drastically different than the original DTCS in most cases.
> We give up on the "alignment" logic that exists in current implementation,
> but I honestly don't think it buys us a lot besides complexity.
> The implementation can obviously be optimized and cleaned up more if folks
> think this is a good idea.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)