Just a few data points from our experience

One of our use cases involves storing a periodic full base state for millions 
of records, then fairly frequent delta updates to subsets of the records in 
between. C* is great for this because we can read the whole row (or up to the 
clustering key/column marking “now” as perceived by the client) and munge the 
base + deltas together in the client.

To keep rows small (and for recovery), we start over in a new CF whenever we 
start a new base state

The upshot is that we have pretty much the same scenario as Jeremy is describing

For this use case we are also using Astyanax (but C* 2.0.5)

We have not come across many of the schema problems you mention (which is 
likely accountable to some changes in the 2.0.x line), however one thing to 
note is that Astyanax itself seems to be very picky about un-resolved schema 
changes. We found that we had to do the schema changes via a CQL “create table” 
(we can still use Astyanax for that) rather than creating it via old style 
thrift CF creation


On May 13, 2014, at 9:42 AM, Jeremy Powell <jeremym.pow...@gmail.com> wrote:

> Hi Kevin,
> 
> C* version: 1.2.xx
> Astyanax: 1.56.xx
> 
> We basically do this same thing in one of our production clusters, but rather 
> than dropping SSTables, we drop Column Families. We time-bucket our CFs, and 
> when a CF has passed some time threshold (metadata or embedded in CF name), 
> it is dropped. This means there is a home-grown system that is doing the 
> bookkeeping/maintenance rather than relying on C*s inner workings. It is 
> unfortunate that we have to maintain a system which maintains CFs, but we've 
> been in a pretty good state for the last 12 months using this method. 
> 
> Some caveats:
> 
> By default, C* makes snapshots of your data when a table is dropped. You can 
> leave that and have something else clear up the snapshots, or if you're less 
> paranoid, set auto_snapshot: false in the cassandra.yaml file.
> 
> Cassandra does not handle 'quick' schema changes very well, and we found that 
> only one node should be used for these changes. When adding or removing 
> column families, we have a single, property defined C* node that is 
> designated as the schema node. After making a schema change, we had to throw 
> in an artificial delay to ensure that the schema change propagated through 
> the cluster before making the next schema change. And of course, relying on a 
> single node being up for schema changes is less than ideal, so handling fail 
> over to a new node is important.
> 
> The final, and hardest problem, is that C* can't really handle schema changes 
> while a node is being bootstrapped (new nodes, replacing a dead node). If a 
> column family is dropped, but the new node has not yet received that data 
> from its replica, the node will fail to bootstrap when it finally begins to 
> receive that data - there is no column family for the data to be written to, 
> so that node will be stuck in the joining state, and it's system keyspace 
> needs to be wiped and re-synced to attempt to get back to a happy state. This 
> unfortunately means we have to stop schema changes when a node needs to be 
> replaced, but we have this flow down pretty well.
> 
> Hope this helps,
> Jeremy Powell
> 
> 
> On Mon, May 12, 2014 at 5:53 PM, Kevin Burton <bur...@spinn3r.com> wrote:
> We have a log only data structure… everything is appended and nothing is ever 
> updated.
> 
> We should be totally fine with having lots of SSTables sitting on disk 
> because even if we did a major compaction the data would still look the same.
> 
> By 'lots' I mean maybe 1000 max.  Maybe 1GB each.
> 
> However, I would like a way to delete older data.
> 
> One way to solve this could be to just drop an entire SSTable if all the 
> records inside have tombstones.
> 
> Is this possible, to just drop a specific SSTable?  
> 
> -- 
> 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> Skype: burtonator
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
> people.
> 
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to