If you make the timestamp the partition key you won't be able to do range
queries (unless you use an ordered partitioner).
Assuming you are logging from multiple devices you will want your partition key
to be the device id the date, your clustering key to be the timestamp
(timeuuid are good
If the data is read from a slice of a partition that has been added over
time there will be a part of that row in every almost sstable. That would
mean all of them (multiple disk seeks depending on clustering order per
sstable) would have to be read from in order to service the query. Data
The following article has some good information for what you describe:
http://www.datastax.com/dev/blog/optimizations-around-cold-sstables
Some related tickets which will provide background:
https://issues.apache.org/jira/browse/CASSANDRA-5228
https://issues.apache.org/jira/browse/CASSANDRA-5515
Whats your data model look like?
I think it would be best to just disable compactions.
Why? are you never doing reads? There is also a cost to repairs/bootstrapping
when you have a ton of sstables. This might be a premature optimization.
If the data is read from a slice of a partition that
Hello Kevin
You can disable compaction by configuring the compaction options of your
table as follow:
compaction={'min_threshold': '0', 'class':
'SizeTieredCompactionStrategy', 'max_threshold': '0'}
Regards
Duy Hai DOAN
On Wed, May 7, 2014 at 2:55 AM, Kevin Burton bur...@spinn3r.com
I'm looking at storing log data in Cassandra…
Every record is a unique timestamp for the key, and then the log line for
the value.
I think it would be best to just disable compactions.
- there will never be any deletes.
- all the data will be accessed in time range (probably partitioned