I'd suggest creating 1 table per day, and dropping the tables you don't need once you're done.
On Wed, Jun 4, 2014 at 10:44 AM, Redmumba <redmu...@gmail.com> wrote: > Sorry, yes, that is what I was looking to do--i.e., create a > "TopologicalCompactionStrategy" or similar. > > > On Wed, Jun 4, 2014 at 10:40 AM, Russell Bradberry <rbradbe...@gmail.com> > wrote: > >> Maybe I’m misunderstanding something, but what makes you think that >> running a major compaction every day will cause they data from January 1st >> to exist in only one SSTable and not have data from other days in the >> SSTable as well? Are you talking about making a new compaction strategy >> that creates SSTables by day? >> >> >> >> On June 4, 2014 at 1:36:10 PM, Redmumba (redmu...@gmail.com) wrote: >> >> Let's say I run a major compaction every day, so that the "oldest" >> sstable contains only the data for January 1st. Assuming all the nodes are >> in-sync and have had at least one repair run before the table is dropped >> (so that all information for that time period is "the same"), wouldn't it >> be safe to assume that the same data would be dropped on all nodes? There >> might be a period when the compaction is running where different nodes >> might have an inconsistent view of just that days' data (in that some would >> have it and others would not), but the cluster would still function and >> become eventually consistent, correct? >> >> Also, if the entirety of the sstable is being dropped, wouldn't the >> tombstones be removed with it? I wouldn't be concerned with individual >> rows and columns, and this is a write-only table, more or less--the only >> deletes that occur in the current system are to delete the old data. >> >> >> On Wed, Jun 4, 2014 at 10:24 AM, Russell Bradberry <rbradbe...@gmail.com> >> wrote: >> >>> I’m not sure what you want to do is feasible. At a high level I can >>> see you running into issues with RF etc. The SSTables node to node are not >>> identical, so if you drop a full SSTable on one node there is no one >>> corresponding SSTable on the adjacent nodes to drop. You would need to >>> choose data to compact out, and ensure it is removed on all replicas as >>> well. But if your problem is that you’re low on disk space then you >>> probably won’t be able to write out a new SSTable with the older >>> information compacted out. Also, there is more to an SSTable than just >>> data, the SSTable could have tombstones and other relics that haven’t been >>> cleaned up from nodes coming or going. >>> >>> >>> >>> >>> On June 4, 2014 at 1:10:58 PM, Redmumba (redmu...@gmail.com) wrote: >>> >>> Thanks, Russell--yes, a similar concept, just applied to sstables. >>> I'm assuming this would require changes to both major compactions, and >>> probably GC (to remove the old tables), but since I'm not super-familiar >>> with the C* internals, I wanted to make sure it was feasible with the >>> current toolset before I actually dived in and started tinkering. >>> >>> Andrew >>> >>> >>> On Wed, Jun 4, 2014 at 10:04 AM, Russell Bradberry <rbradbe...@gmail.com >>> > wrote: >>> >>>> hmm, I see. So something similar to Capped Collections in MongoDB. >>>> >>>> >>>> >>>> On June 4, 2014 at 1:03:46 PM, Redmumba (redmu...@gmail.com) wrote: >>>> >>>> Not quite; if I'm at say 90% disk usage, I'd like to drop the oldest >>>> sstable rather than simply run out of space. >>>> >>>> The problem with using TTLs is that I have to try and guess how much >>>> data is being put in--since this is auditing data, the usage can vary >>>> wildly depending on time of year, verbosity of auditing, etc.. I'd like to >>>> maximize the disk space--not optimize the cleanup process. >>>> >>>> Andrew >>>> >>>> >>>> On Wed, Jun 4, 2014 at 9:47 AM, Russell Bradberry <rbradbe...@gmail.com >>>> > wrote: >>>> >>>>> You mean this: >>>>> >>>>> https://issues.apache.org/jira/browse/CASSANDRA-5228 >>>>> >>>>> ? >>>>> >>>>> >>>>> >>>>> On June 4, 2014 at 12:42:33 PM, Redmumba (redmu...@gmail.com) wrote: >>>>> >>>>> Good morning! >>>>> >>>>> I've asked (and seen other people ask) about the ability to drop old >>>>> sstables, basically creating a FIFO-like clean-up process. Since we're >>>>> using Cassandra as an auditing system, this is particularly appealing to >>>>> us >>>>> because it means we can maximize the amount of auditing data we can keep >>>>> while still allowing Cassandra to clear old data automatically. >>>>> >>>>> My idea is this: perform compaction based on the range of dates >>>>> available in the sstable (or just metadata about when it was created). >>>>> For >>>>> example, a major compaction could create a combined sstable per day--so >>>>> that, say, 60 days of data after a major compaction would contain 60 >>>>> sstables. >>>>> >>>>> My question then is, will this be possible by simply implementing a >>>>> separate AbstractCompactionStrategy? Does this sound feasilble at all? >>>>> Based on the implementation of Size and Leveled strategies, it looks like >>>>> I >>>>> would have the ability to control what and how things get compacted, but I >>>>> wanted to verify before putting time into it. >>>>> >>>>> Thank you so much for your time! >>>>> >>>>> Andrew >>>>> >>>>> >>>> >>> >> > -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade