Re: Using 5-6 bytes for cassandra timestamps vs 8…

2012-01-19 Thread Ertio Lew
It wont obviously matter in case your columns are fat but in several cases, (at least I could think of several cases) where you need to, for example, just store an integer column name empty column value. Thus 12 bytes for the column where 8 bytes is just the overhead to store timestamps doesn't

Re: How to store unique visitors in cassandra

2012-01-19 Thread Alain RODRIGUEZ
Hi thanks for your answer but I don't want to add more layer on top of Cassandra. I also have done all of my application without Countandra and I would like to continue this way. Furthermore there is a Cassandra modeling problem that I would like to solve, and not just hide. Alain 2012/1/18

Re: poor Memtable performance on column slices?

2012-01-19 Thread Sylvain Lebresne
On Thu, Jan 19, 2012 at 3:54 AM, Josep Blanquer blanq...@rightscale.com wrote: On Wed, Jan 18, 2012 at 12:44 PM, Jonathan Ellis jbel...@gmail.com wrote: On Wed, Jan 18, 2012 at 12:31 PM, Josep Blanquer blanq...@rightscale.com wrote: If I do a slice without a start (i.e., get me the first

Re: Max records per node for a given secondary index value

2012-01-19 Thread aaron morton
Each node is stores the rows in it's token range, and those in the token ranges it is a replica for. So it will store roughly num_nodes / rf the rows. If you are approaching a situation where the node may store 2 billion rows, and so may have 2 billion entries in the secondary index row, you

Re: Incremental backups

2012-01-19 Thread aaron morton
Did you run a scrub as part of the upgrade process ? That will re-write all the sstables and remove the old ones. If not run a scrub now and it will re-write the data with a -hb- format in the file name. Cheers - Aaron Morton Freelance Developer @aaronmorton

Re: How to store unique visitors in cassandra

2012-01-19 Thread aaron morton
Some tips here from Matt Dennis on how to model time series data http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/01/2012, at 10:30 PM, Alain RODRIGUEZ wrote: Hi

RE: Incremental backups

2012-01-19 Thread Michael Vaknine
When I upgraded I did it in 2 stages. Upgrade from 0.7.6 to 1.0.0 Run scrub on each node. Run repair on the cluster Upgrade to 1.0.3 Is it safe to run scrub again? Because it did not seem to help when I updated it to 1.0.0 Was there a bug in the scrub process in 1.0.0? What is the

CQL 'Where' clause ignores secondary index filter

2012-01-19 Thread vaibhav . s
Hi, I've defined a column family 'Vaibhav' in which every row has few columns and its values. I've declared two column as secondary index so that I can filter the rows on the basis of those column values. Now whenever I execute a CQL with either only rowkey or column name in 'WHERE'

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread Marcel Steinbach
On 18.01.2012, at 02:19, Maki Watanabe wrote: Are there any significant difference of number of sstables on each nodes? No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB) On 17.01.2012, at 20:14, Jeremiah Jordan wrote: Are you

RE: Garbage collection freezes cassandra node

2012-01-19 Thread Rene Kochen
Thanks for your comments. The application is indeed suffering from a freezing Cassandra node. Queries are taking longer than 10 seconds at the moment of a full garbage collect. Here is an example from the logs. I have a three node cluster. At some point I see on a node the following log:

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread Marcel Steinbach
2012/1/19 aaron morton aa...@thelastpickle.com: If you have performed any token moves the data will not be deleted until you run nodetool cleanup. We did that after adding nodes to the cluster. And then, the cluster wasn't balanced either. Also, does the Load really account for dead data, or is

Re: Deploying Cassandra 1.0.7 on EC2 in minutes

2012-01-19 Thread Andrei Savu
On Wed, Jan 18, 2012 at 7:58 PM, Rustam Aliyev rus...@code.az wrote: Hi Andrei, As you know, we are using Whirr for ElasticInbox ( https://github.com/elasticinbox/whirr-elasticinbox). While testing we encountered a few minor problems which I think could be improved. Note that we were using

Re: Deploying Cassandra 1.0.7 on EC2 in minutes

2012-01-19 Thread Rustam Aliyev
Great, will try 0.7.1 when it's ready. (Bug I mentioned was already reported) On 19/01/2012 13:15, Andrei Savu wrote: On Wed, Jan 18, 2012 at 7:58 PM, Rustam Aliyev rus...@code.az mailto:rus...@code.az wrote: Hi Andrei, As you know, we are using Whirr for ElasticInbox

Re: nodetool ring question

2012-01-19 Thread R. Verlangen
I will have a look very soon and if I find something I'll let you know. Thank you in advance! 2012/1/19 aaron morton aa...@thelastpickle.com Michael, Robin Let us know if the reported live load is increasing and diverging from the on disk size. If it is can you check nodetool cfstats and

Re: How to store unique visitors in cassandra

2012-01-19 Thread Alain RODRIGUEZ
Thanks aaron, I already paid attention to these slides and I just looked at them again. I'm still in the dark about how to get the number of unique visitors between 2 dates (randomly chosen, because chosen by user) efficiently. I could easily count them per hour, day, week, month... But it's a

Re: Garbage collection freezes cassandra node

2012-01-19 Thread Mohit Anchlia
What's the version of Java do you use? Can you try reducing NewSize and increasing Old generation? If you are on old version of Java I also recommend upgrading that version. On Thu, Jan 19, 2012 at 3:27 AM, Rene Kochen rene.koc...@emea.schange.com wrote: Thanks for your comments. The application

Re: Incremental backups

2012-01-19 Thread aaron morton
mmm, they are not included in the snapshot they are probably not used. Have you dropped an index call 09partition on AttractionCheckins? In [52]: .join(chr(int(x+y, 16)) for x,y in zip(3039706172746974696f6e[0::2], 3039706172746974696f6e[1::2])) Out[52]: '09partition' The simple thing to do is

Re: CQL 'Where' clause ignores secondary index filter

2012-01-19 Thread aaron morton
It is working as expected. Because you have specified a KEY the query returns records that match that key(s), and it ignores the other clauses. Selecting rows follows one of three paths: * selects rows by key(s) * select rows by key range, i.e. rows after this key. * select rows by

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread aaron morton
Load reported from node tool ring is the live load, which means SSTables that the server has open and will read from during a request. This will include tombstones, expired and over written data. nodetool ctstats also includes dead load, which is sstables that are in use but still on disk.

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread Narendra Sharma
I believe you need to move the nodes on the ring. What was the load on the nodes before you added 5 new nodes? Its just that you are getting data in certain token range more than others. -Naren On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: On 18.01.2012,

Re: How to store unique visitors in cassandra

2012-01-19 Thread Tyler Hobbs
On Thu, Jan 19, 2012 at 8:25 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: I'm still in the dark about how to get the number of unique visitors between 2 dates (randomly chosen, because chosen by user) efficiently. I could easily count them per hour, day, week, month... But it's a bit

Re: How to store unique visitors in cassandra

2012-01-19 Thread Milind Parikh
You might want to look at the code in countandra.org; regardless of whether you use it. It use a model of dynamic composite keys (although static composite keys would have worked as well). For the actual query,only one row is hit. This of course only works bc the data model is attuned for the

RE: cassandra 1.0.6 rpm

2012-01-19 Thread Shu Zhang
Thanks Philippe, I checked their docs. RPMs should be at http://rpm.datastax.com/community/ now, but 1.0.6 is not there either. Can someone at datastax please comment on this? Are you guys no longer packaging cassandra releases? From: Philippe

Re: Garbage collection freezes cassandra node

2012-01-19 Thread Peter Schuller
On node 172.16.107.46, I see the following: 21:53:27.192+0100: 1335393.834: [GC 1335393.834: [ParNew (promotion failed): 319468K-324959K(345024K), 0.1304456 secs]1335393.964: [CMS: 6000844K-3298251K(8005248K), 10.8526193 secs] 6310427K-3298251K(8350272K), [CMS Perm :

Re: cassandra hit a wall: Too many open files (98567!)

2012-01-19 Thread Thorsten von Eicken
Ah, that explains part of the problem indeed. The whole situation still doesn't make a lot of sense to me, unless the answer is that the default sstable size with level compaction is just no good for large datasets. I restarted cassandra a few hours ago and it had to open about 32k files at

ideal cluster size

2012-01-19 Thread Thorsten von Eicken
We're embarking on a project where we estimate we will need on the order of 100 cassandra nodes. The data set is perfectly partitionable, meaning we have no queries that need to have access to all the data at once. We expect to run with RF=2 or =3. Is there some notion of ideal cluster size? Or

Re: CQL 'Where' clause ignores secondary index filter

2012-01-19 Thread vaibhav . s
Dear Aaron, Thanks for the information. Actually it's a normal query which works with SQL. I believe there will be some mechanism to do so in Cassandra, as first retrieving the records based on key and then checking for the column index later will be inefficient. Thanks again. Regards,

Re: ideal cluster size

2012-01-19 Thread Peter Schuller
We're embarking on a project where we estimate we will need on the order of 100 cassandra nodes. The data set is perfectly partitionable, meaning we have no queries that need to have access to all the data at once. We expect to run with RF=2 or =3. Is there some notion of ideal cluster size?

Re: CQL 'Where' clause ignores secondary index filter

2012-01-19 Thread Sylvain Lebresne
I think that qualify as a bug. We should either refuse the query if we don't know how to do this correctly or return a sensible result (i.e, no result in that case). Would you mind opening a ticket on https://issues.apache.org/jira/browse/CASSANDRA? -- Sylvain On Fri, Jan 20, 2012 at 6:39 AM,