Column families for "most recent data", (a.k.a. size-safe wide rows)
--------------------------------------------------------------------
Key: CASSANDRA-3999
URL: https://issues.apache.org/jira/browse/CASSANDRA-3999
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Ahmet AKYOL
"Wide row design" is very handy (for time series data) and on the other hand we
have to keep each row size around an acceptable amount. Then, we need buckets;
right? Monthly, daily or even hourly buckets... The problem with bucket
approach is the distribution of data in rows (as always).
So, why not to tell cassandra we want a column family like LRU cache but on
disk. If we start design from queries we usually end up with "most recent data"
queries. This "size safe wide rows" approach can be very useful in many use
cases.
Here are some example hypothetical column family storage parameters :
max_column_number_hint : 1000 // meaning: try to keep around 1000 columns.
Since it's a hint, we(users) are OK with tombstones or 800 - 1200 range
or
max_row_size_hint : 1MB
I don't know "Cassandra Internals" but C* has already background jobs( for
compaction,deletion and ttl) and columns already have timestamps. So both from
user point of view and C*, it makes sense.
P.S: Sorry for my poor English and it's my very first "issue" :)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira