Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread Alain RODRIGUEZ
Hi, Sorry about the gravedigging, but what would be a good start value to tune rpc_max_threads ? I mean, default is unlimited, the value commented is 2048. Native protocol seems to only allow 128 simultaneous threads. Should I stick to 2048 or try with something closer to 128 or even something

diff cassandra.yaml 1.2 -- 2.1

2014-12-29 Thread Alain RODRIGUEZ
Hi guys, I am looking at added and dropped option in Cassandra between 1.2.18 and 2.0.11 and this makes me wonder: Why has the index_interval option been removed from cassandra.yaml ? I know we can also define it on a per table basis, yet, this global option was quite useful to tune memory

User click count

2014-12-29 Thread Ajay
Hi, Is it better to use Counter to User click count than maintaining creating new row as user id : timestamp and count it. Basically we want to track the user clicks and use the same for hourly/daily/monthly report. Thanks Ajay

Re: User click count

2014-12-29 Thread Janne Jalkanen
Hi! It’s really a tradeoff between accurate and fast and your read access patterns; if you need it to be fairly fast, use counters by all means, but accept the fact that they will (especially in older versions of cassandra or adverse network conditions) drift off from the true click count.

Re: User click count

2014-12-29 Thread Ajay
Hi, So you mean to say counters are not accurate? (It is highly likely that multiple parallel threads trying to increment the counter as users click the links). Thanks Ajay On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: Hi! It’s really a tradeoff between

Re: User click count

2014-12-29 Thread Alain RODRIGUEZ
Hi Ajay, Here is a good explanation you might want to read. http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters Though we use counters for 3 years now, we used them from start C* 0.8 and we are happy with them. Limits I can see in both ways are:

Re: Best practice for sorting on frequent updated column?

2014-12-29 Thread Eric Stevens
This is a bit difficult. Depending on your access patterns and data volume, I'd be inclined to keep a separate table with a (count, foreign_key) clustering key. Then do a client-side join to read the data back in the order you're looking for. That will at least make the heavily updated table

Re: diff cassandra.yaml 1.2 -- 2.1

2014-12-29 Thread Jason Wee
https://issues.apache.org/jira/browse/CASSANDRA-3534 On Mon, Dec 29, 2014 at 6:58 PM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi guys, I am looking at added and dropped option in Cassandra between 1.2.18 and 2.0.11 and this makes me wonder: Why has the index_interval option been removed

Re: User click count

2014-12-29 Thread Ajay
Thanks for the clarification. In my case, Cassandra is the only storage. If the counters get incorrect, it could't be corrected. For that if we store raw data, we can as well go that approach. But the granularity has to be as seconds level as more than one user can click the same link. So the

Re: User click count

2014-12-29 Thread Eric Stevens
If the counters get incorrect, it could't be corrected You'd have to store something that allowed you to correct it. For example, the TimeUUID approach to keep true counts, which are slow to read but accurate, and a background process that trues up your counter columns periodically. On Mon,

Re: diff cassandra.yaml 1.2 -- 2.1

2014-12-29 Thread Alain RODRIGUEZ
Thanks for the pointer Jason, Yet, I thought that cache and memtables went off-heap only in version 2.1 and not 2.0 (As of Cassandra 2.0, there are two major pieces of the storage engine that still depend on the JVM heap: memtables and the key cache. --

Re: Why a cluster don't start after cassandra.yaml range_timeout parameter change ?

2014-12-29 Thread Alain RODRIGUEZ
Did you solved this issue ? I guess nobody answers you because this is very weird. I also guess you've made some mistake on the configuration. Anyway, let me know if you managed to get out of the mess somehow or if you still need help. C*heers 2014-12-03 15:57 GMT+01:00 Castelain, Alain

Re: Repair/Compaction Completion Confirmation

2014-12-29 Thread Alain RODRIGUEZ
I noticed (and reported) a bug that made me drop this tool -- https://github.com/BrianGallew/cassandra_range_repair/issues/16 Might this be related somehow ? C*heers Alain 2014-11-21 13:30 GMT+01:00 Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com: Hey guys, Just reviving this

Re: diff cassandra.yaml 1.2 -- 2.1

2014-12-29 Thread Jason Wee
What you are asking maybe answer in the code level and pretty deep stuff, at least from user (like me) point of view. But to quote Jonathan in CASSANDRA-3534, Then you will be able to say use X amount of memory for memtables, Y amount for the cache (and monitor Z amount for the bloom filters)

Re: diff cassandra.yaml 1.2 -- 2.1

2014-12-29 Thread Alain RODRIGUEZ
I made an error on Topic title. We are indeed going to do it (that's why I made the mistake), but I am speaking of 1.2 -- 2.0 here, and we will start by this before going to 2.1, since we want to do it in rolling upgrade way. Thanks for your enlightening pointer about this vanished pressure

Re: CQL3 vs Thrift

2014-12-29 Thread Robert Coli
On Tue, Dec 23, 2014 at 10:26 AM, Peter Lin wool...@gmail.com wrote: I'm bias in favor of using both thrift and CQL3, though many people on the list probably think I'm crazy. I don't think you're crazy but I do think you will ultimately face the deprecation of thrift. Briefly, I disbelieve

Re: CQL3 vs Thrift

2014-12-29 Thread Peter Lin
In my bias opinion something else should replace CQL and it needs a proper rewrite on the sever side. I've studied the code and having written query parsers and planners, what is there today isn't going to work long term. Whatever replaced both thrift and CQL needs to provide 100% of the

Re: Changing replication factor of Cassandra cluster

2014-12-29 Thread Pranay Agarwal
Thanks Ryan. I want to understand what is the best way to increase/change the replica factor of the cassandra cluster? My priority is consistency and probably I am tolerant about some down time of the cluster. Is it totally weird to try changing replica later or are there people doing it for

Re: Nodes Dying in 2.1.2

2014-12-29 Thread Robert Coli
On Wed, Dec 24, 2014 at 9:41 AM, Phil Burress philtburr...@gmail.com wrote: Just upgraded our cluster from 2.1.1 to 2.1.2 and our nodes keep dying. The kernel is killing the process due to out of memory: https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ Appears to

Re: Node down during move

2014-12-29 Thread Robert Coli
On Tue, Dec 23, 2014 at 12:29 AM, Jiri Horky ho...@avast.com wrote: just a follow up. We've seen this behavior multiple times now. It seems that the receiving node loses connectivity to the cluster and thus thinks that it is the sole online node, whereas the rest of the cluster thinks that it

Re: Changing replication factor of Cassandra cluster

2014-12-29 Thread Robert Coli
On Mon, Dec 29, 2014 at 1:40 PM, Pranay Agarwal agarwalpran...@gmail.com wrote: I want to understand what is the best way to increase/change the replica factor of the cassandra cluster? My priority is consistency and probably I am tolerant about some down time of the cluster. Is it totally

Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread mck
Should I stick to 2048 or try with something closer to 128 or even something else ? 2048 worked fine for us. About HSHA, I anti-recommend hsha, serious apparently unresolved problems exist with it. We saw an improvement when we switched to HSHA, particularly for our offline

Re: Nodes Dying in 2.1.2

2014-12-29 Thread Robert Coli
Might be https://issues.apache.org/jira/browse/CASSANDRA-8061 or one of the linked/duplicate tickets. =Rob On Mon, Dec 29, 2014 at 1:40 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Dec 24, 2014 at 9:41 AM, Phil Burress philtburr...@gmail.com wrote: Just upgraded our cluster from

Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread Robert Coli
On Mon, Dec 29, 2014 at 2:03 PM, mck m...@apache.org wrote: We saw an improvement when we switched to HSHA, particularly for our offline (hadoop/spark) nodes. Sorry i don't have the data anymore to support that statement, although i can say that improvement paled in comparison to

Internal pagination in secondary index queries

2014-12-29 Thread Sam Klock
Hi folks, Perhaps this is a question better addressed to the Cassandra developers directly, but I thought I'd ask it here first. We've recently been benchmarking certain uses of secondary indexes in Cassandra 2.1.x, and we've noticed that when the number of items in an index reaches beyond

Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread mck
Perf is better, correctness seems less so. I value latter more than former. Yeah no doubt. Especially in CASSANDRA-6285 i see some scary stuff went down. But there are no outstanding bugs that we know of, are there? (CASSANDRA-6815 remains just a wrap up of how options are to be presented

Re: CQL3 vs Thrift

2014-12-29 Thread Eric Stevens
So while not exactly the same, this seems like a good analogy for suggesting a third interface to fix problems with existing interfaces: http://xkcd.com/927/ Even if the CQL parsing code in Cassandra is subpar (I haven't studied it), that's not an especially compelling case to suggest replacing

Re: Internal pagination in secondary index queries

2014-12-29 Thread Jonathan Haddad
Secondary indexes are there for convenience, not performance. If you're looking for something performant, you'll need to maintain your own indexes. On Mon Dec 29 2014 at 3:22:58 PM Sam Klock skl...@akamai.com wrote: Hi folks, Perhaps this is a question better addressed to the Cassandra

Re: CQL3 vs Thrift

2014-12-29 Thread Peter Lin
The kind of query language I'm thinking of is closer to Datalog, which is what Datomic uses. It's a personal bias, but I find it easier and cleaner to express joins, subqueries and correlated subqueries in a LISP-like/datalog like syntax than SQL. Since CQL is modeled/inspired by SQL, it inherits