[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051345#comment-13051345 ]
Yang Yang edited comment on CASSANDRA-2735 at 6/17/11 9:52 PM: --------------------------------------------------------------- there could be a problem with trying to rely on forcing compaction order to make counter expiration work: if you base the intended order on max timestamp of each sstable, the timestamp is not trustworthy, because a single malicious client request can bump up its timestamp to the future, and arbitrarily change the order of compaction, thus rendering the approach in 2735 useless. you can't base the order on the physical sstable flush time either, since different nodes have different flush times. overall I think trying to fix the compaction order is not the correct direction to attack this problem: the issue here is due to the changing order between *individual* counter adds/deletes (auto-expire is same as delete), this order can be different between different counters, so you have to fix the order between the updates within each counter, not the order between *ensembles of counters*. such ensembles of counters do not guarantee any orders at all, due to randomness in flushing time, or message delivery (they have similar effects) the problem with current counter+delete implementation is that counters use timestamp() to represent their order, but when they are merged, they lose their *individual order* and retain a max timestamp(), which supposedly represents the order of the ensemble, but this is meaningless because the it is the order of the ensemble is different from the true order. was (Author: yangyangyyy): there could be a problem with trying to rely on forcing compaction order to make counter expiration work: if you base the intended order on max timestamp of each sstable, the timestamp is not trustworthy, because a single malicious client request can bump up its timestamp to the future, and arbitrarily change the order of compaction, thus rendering the approach in 2735 useless. you can't base the order on the physical sstable flush time either, since different nodes have different flush times. overall I think trying to fix the compaction order is not the correct direction to attack this problem: the issue here is due to the changing order between *individual* counter adds/deletes (auto-expire is same as delete), this order can be different between different counters, so you have to fix the order between the updates within each counter, not the order between *ensembles of counters*. such ensembles of counters do not guarantee any orders at all, due to randomness in flushing time, or message delivery (they have similar effects) > Timestamp Based Compaction Strategy > ----------------------------------- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Alan Liang > Assignee: Alan Liang > Priority: Minor > Labels: compaction > Attachments: 0004-timestamp-bucketed-compaction-strategy.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira