Re: Efficient Paging Option in Wide Rows

2016-04-24 Thread Clint Martin
I tend to agree with Carlos. Having multiple row keys and parallelizing your queries will tend to result in faster responses. Keeping positions relatively small will also help your cluster to manage your data more efficiently also resulting in better performance. One thing I would recommend is

Re: Upgrading to SSD

2016-04-23 Thread Clint Martin
As long as you shut down the node before you start copying and moving stuff around it shouldn't matter if you take backups or snapshots or whatever. When you add the filesystem for the ssd will you be removing the existing filesystem? Or will you be able to keep both filesystems mounted at the

Re: Rack aware question.

2016-03-23 Thread Clint Martin
I could be wrong on this since I've never actually attempted what you are asking. Based on my understanding of how replica assignment is done, I don't think that just changing the rack on an existing node is a good idea. Changing racks for a node that already contains data would result in that

Re: Multi DC setup for analytics

2016-03-20 Thread Clint Martin
When you say you have two logical DC both with the same name are you saying that you have two clusters of servers both with the same DC name, nether of which currently talk to each other? IE they are two separate rings? Or do you mean that you have two keyspaces in one cluster? Or? Clint On Mar

Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Clint Martin
I would arrange your primary key by how You intend to query. Primary key ((executedby), auditid) This allows you to query for who did it, and optionally on a time range for when it occurred. Retrieving in chronological order. You could do it with your proposed schema and Lucene but for what

Re: Strategies for avoiding corrupted duplicate data?

2016-03-19 Thread Clint Martin
Light weight transactions are going to be somewhat key to this. As are batches. The interesting thing about these views is that changing an email address is not the same operation on all of them. For The users by email view you have to delete a given existing row and insert a new one. For the

Re: Compaction Filter in Cassandra

2016-03-19 Thread Clint Martin
I would definitely be interested in this. Clint On Mar 15, 2016 9:36 PM, "Eric Stevens" wrote: > We have been working on filtering compaction for a month or so (though we > call it deleting compaction, its implementation is as a filtering > compaction strategy). The feature

Re: Replacing disks

2016-02-28 Thread Clint Martin
nuanced. datastax had a blog post that describes this better as well as limitations to the algorithm in 2.1 which are addressed in the 3.x releases ) Clint On Feb 28, 2016 10:11 AM, "Michał Łowicki" <mlowi...@gmail.com> wrote: > > > On Sun, Feb 28, 2016 at 4:00 PM, C

Re: Replacing disks

2016-02-28 Thread Clint Martin
Your plan for replacing your 200gb drive sounds good to me. Since you are running jbod, I wouldn't worry about manually redistributing data from your other disk to the new one. Cassandra will do that for you as it performs compaction. While you're doing the drive change, you need to complete the

Re: Cassandra Collections performance issue

2016-02-11 Thread Clint Martin
I have experienced excessive performance issues while using collections as well. Mostly my issue was due to the excessive number of cells per partition that having a modest map size requires. Since you are reading and writing the entire map, you can probably gain some performance the same way I

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Clint Martin
What sort of data is your clustering key composed of? That might help some in determining a way to achieve what you're looking for. Clint On Jan 5, 2016 2:28 PM, "Jim Ancona" wrote: > Hi Nate, > > Yes, I've been thinking about treating customers as either small or big, >

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-04 Thread Clint Martin
You should endeavor to use a repeatable method of segmenting your data. Swapping partitions every time you "fill one" seems like an anti pattern to me. but I suppose it really depends on what your primary key is. Can you share some more information on this? In the past I have utilized the

Re: Oracle TIMESTAMP(9) equivalent in Cassandra

2015-10-29 Thread Clint Martin
Generating the time uuid on the server side via the now() function also makes the operation non idempotent. This may not be a huge problem for your application but it is something to keep in mind. Clint On Oct 29, 2015 9:01 AM, "Kai Wang" wrote: > If you want the timestamp to

Re: Downtime-Limit for a node in Network-Topology-Replication-Cluster?

2015-10-24 Thread Clint Martin
Max hint window is only part of the equation. If it is down longer than Max hint window, a repair will still fix up the node for you. The max time a node can be down before it must be re built is determined by the lowest gc grace setting on your various tables. By default gc grace is 10 days,

Re: Changing schema on multiple nodes while they are isolated

2015-10-05 Thread Clint Martin
the cluster is really > interesting. I am going to explore that in more detail. > > Thanks for the good idea. > > On 3 October 2015 at 00:03, Clint Martin < > clintlmar...@coolfiretechnologies.com> wrote: > >> You could use a two key space method. At star

Re: Changing schema on multiple nodes while they are isolated

2015-10-02 Thread Clint Martin
You could use a two key space method. At startup, wait some time for the node to join the cluster. the first time the app starts, you can be in one of three states: The happiest state is that you succeed in joining a cluster. in this case you will get replicated the cluster's keyspace and can