Re: memtable overhead

2013-07-23 Thread Michał Michalski
Not sure how up-to-date this info is, but from some discussions that happened here long time ago I remember that a minimum of 1MB per Memtable needs to be allocated. The other constraint here is memtable_total_space_in_mb setting in cassandra.yaml, which you might wish to tune when having a

Re: OPP seems completely unsupported in Cassandra 1.2.5

2013-07-23 Thread Cyril Scetbon
AFAIK, OPP is no longer supported and you should use ByteOrderedPartitioner (support of non-UTF characters too) instead : see http://www.datastax.com/docs/1.2/cluster_architecture/partitioners -- Cyril SCETBON On Jul 22, 2013, at 4:10 PM, Vara Kumar varaku...@gmail.com wrote: We were using

Re: Safely adding new nodes without losing data

2013-07-23 Thread aaron morton
I think you are correct. When the new node starts it randomly selects tokens, which result in a random set of token ranges being transferred from other nodes. For each pending range the existing token ranges in the cluster are searched to find one that contains the range we want to transfer.

Re: funnel analytics, how to query for reports etc.

2013-07-23 Thread aaron morton
For background on rollup analytics: Twitter Rainbird http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011 Acunu http://www.acunu.com/ Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On

Re: How to avoid inter-dc read requests

2013-07-23 Thread aaron morton
All the read/write request are issued with CL local quorum, but still there're a lot of inter-dc read request. How are you measuring this ? Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 22/07/2013, at 8:41 AM, sankalp

Re: Are Writes disk-bound rather than CPU-bound?

2013-07-23 Thread aaron morton
“/Insert-heavy workloads will actually be CPU-bound in Cassandra before being memory-bound/” What is the source for that ? This is because everything is *first *written to the commit log *on disk*. Any thoughts?? Pretty much. Cheers - Aaron Morton Cassandra Consultant New

Re: CPU Bound Writes

2013-07-23 Thread aaron morton
That's very old documentation, try using the current docs. Although the statement is syntactically correct, it will become CPU bound before becoming memory bound. That statement says nothing about the IO use. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton

Re: OPP seems completely unsupported in Cassandra 1.2.5

2013-07-23 Thread Sylvain Lebresne
- I know that OPP is deprecated. Is it that OPP completely unsupported? Is it stated in upgrade instructions or some where? Did we miss it? Basically yes, OPP is not going to work in 1.2 because of the System tables. I don't think you'll find any upgrade instructions anywhere because to be

Re: Socket buffer size

2013-07-23 Thread aaron morton
Has anyone tried configuring the (internode_send_buff_size_in_bytes) parameter? Here is the Traceback (most recent call last): Are you setting this on the client or the server ? It's a server side setting from the cassandra.yaml file. Cheers - Aaron Morton Cassandra

Re: CL1 and CLQ with 5 nodes cluster and 3 alives node

2013-07-23 Thread aaron morton
I really don't think I have more than 500 million rows ... any smart way to count rows number inside the ks? use the output from nodetool cfstats, it has a row count and bloom filter size for each CF. You may also want to upgrade to 1.1 to get global cache management, that can make things

Re: Cassandra Out of Memory on startup while reading cache

2013-07-23 Thread aaron morton
As a work around remove the key / row caches before startup. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 23/07/2013, at 6:44 AM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: Sounds like this:

Re: memtable overhead

2013-07-23 Thread aaron morton
An empty memtable would not take up much space, a few KB I would assume. However they are considered in the calculations that control how frequently to flush to disk. The more CF's, even if they do not have data, the more frequently you will flush to disk. Cheers - Aaron

About column family

2013-07-23 Thread bjbylh
Hi all: i have two questions to ask: 1,how many column families can be created in a cluster?is there a limit to the number of it? 2,it spents 2-5 seconds to create a new cf while the cluster contains about 1 cfs(if the cluster is empty,it spents about 0.5s).is it normal?how to improve the 

Re: cassandra 1.2.6 - Start key's token sorts after end token

2013-07-23 Thread Hiller, Dean
Out of curiosity, what version of hadoop are you using with cassandra? I think we are trying 0.20.2 if I remember(I have to ask my guy working on it to be sure). I do remember him saying the cassandra maven dependency was odd in that it is in the older version and not a newer hadoop version.

Re: How to avoid inter-dc read requests

2013-07-23 Thread Omar Shibli
I simply monitor the load avg of the nodes using opscenter. I started with idle nodes (by idle I mean load avg of all nodes 1.0), then started to run a lot of key slice read requests on *analytic DC *with CL local quorum (I also made sure that the client worked with only with analytic DC), after

Re: sstable size change

2013-07-23 Thread Keith Wright
Can you elaborate on what you mean by let it take its own course organically? Will Cassandra force any newly compacted files to my new setting as compactions are naturally triggered? From: sankalp kohli kohlisank...@gmail.commailto:kohlisank...@gmail.com Reply-To:

Re: NPE in CompactionExecutor

2013-07-23 Thread Paul Ingalls
I'm running the latest from the 1.2 branch as of a few days ago. I needed one of the patches that will be in 1.2.7 There was no error stack, just that line in the log. I wiped the database (deleted all the files in the lib dir) and restarted my data load, and am consistently running into the

Re: Are Writes disk-bound rather than CPU-bound?

2013-07-23 Thread hajjat
On Tue, Jul 23, 2013 at 5:05 AM, aaron morton [via cassandra-u...@incubator.apache.org] ml-node+s3065146n7589236...@n2.nabble.com wrote: “/Insert-heavy workloads will actually be CPU-bound in Cassandra before being memory-bound/” What is the source for that ?

Re: funnel analytics, how to query for reports etc.

2013-07-23 Thread S Ahmed
Thanks Aaron. Too bad Rainbird isn't open sourced yet! On Tue, Jul 23, 2013 at 4:48 AM, aaron morton aa...@thelastpickle.comwrote: For background on rollup analytics: Twitter Rainbird http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011 Acunu

high write load, with lots of updates, considerations? tomestombed data coming back to life

2013-07-23 Thread S Ahmed
I was watching some videos from the C* summit 2013 and I recall many people saying that if you can some up with a design where you don't preform updates on rows, that would make things easier (I believe it was because there would be less compaction). When building an Analytics (time series) app

Re: sstable size change

2013-07-23 Thread Robert Coli
On Tue, Jul 23, 2013 at 6:48 AM, Keith Wright kwri...@nanigans.com wrote: Can you elaborate on what you mean by let it take its own course organically? Will Cassandra force any newly compacted files to my new setting as compactions are naturally triggered? You see, when two (or more!)

Re: About column family

2013-07-23 Thread Robert Coli
On Tue, Jul 23, 2013 at 3:23 AM, bjbylh bjb...@me.com wrote: 1,how many column families can be created in a cluster?is there a limit to the number of it? Low number of hundreds is highest practical. The limit in practice is amount of heap, each CF consumes heap. 2,it spents 2-5 seconds to

Re: Safely adding new nodes without losing data

2013-07-23 Thread Robert Coli
On Sat, Jul 20, 2013 at 7:30 AM, E S tr1skl...@yahoo.com wrote: I am trying to understand the best procedure for adding new nodes. The one that I see most often online seems to have a hole where there is a low probability of permanently losing data. I want to understand what I am missing

Re: About column family

2013-07-23 Thread Hiller, Dean
We use PlayOrm to have 60,000 VIRTUAL column families such that the performance is just fine ;). You may want to try something like that. Dean From: Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org

Re: Are Writes disk-bound rather than CPU-bound?

2013-07-23 Thread Alex Popescu
I see pretty much the same formulation in the 1.2 docs, so I'm wondering what would be the best rewrite of that paragraph? On Tue, Jul 23, 2013 at 9:00 AM, hajjat haj...@purdue.edu wrote: On Tue, Jul 23, 2013 at 5:05 AM, aaron morton [via [hidden

Re: Are Writes disk-bound rather than CPU-bound?

2013-07-23 Thread Hiller, Dean
Out of curiosity, isn't what is really happening is this As writes keep coming in, memory fills up causing flushes to the commit log disk of the whole memtable. In a bursting scenario, writes are thus limited only by memory and cpu in short bursting cases that tend to fit in memory. In a

Unable to describe table in CQL 3

2013-07-23 Thread Rahul Gupta
I am using Cassandra ver 1.1.9.7 Created a Column Family using Cassandra-cli. create column family events with comparator = 'CompositeType(DateType,UTF8Type)' and key_validation_class = 'UUIDType' and default_validation_class = 'UTF8Type'; I can describe this CF using CQL2 but getting error when

Re: sstable size change

2013-07-23 Thread sankalp kohli
Will Cassandra force any newly compacted files to my new setting as compactions are naturally triggered Yes. Let it compact and increase in size. On Tue, Jul 23, 2013 at 9:38 AM, Robert Coli rc...@eventbrite.com wrote: On Tue, Jul 23, 2013 at 6:48 AM, Keith Wright kwri...@nanigans.comwrote:

Decommission an entire DC

2013-07-23 Thread Lanny Ripple
Hi, We have a multi-dc setup using DC1:2, DC2:2. We want to get rid of DC1. We're in the position where we don't need to save any of the data on DC1. We know we'll lose a (tiny. already checked) bit of data but our processing is such that we'll recover over time. How do we drop DC1 and just

Re: cassandra 1.2.6 - Start key's token sorts after end token

2013-07-23 Thread Marcelo Elias Del Valle
Dean, I am using hadoop 1.0.3. Indeed, using Cassandra 1.2.3 with Random partitioner, it worked. However, it's the only reason for me to use randompartitioner, I really would like to move forward. Besides, I tried to use Cassandra 1.2.6 with RandomPartitioner and I got problems when

Re: cassandra 1.2.6 - Start key's token sorts after end token

2013-07-23 Thread Hiller, Dean
Perhaps try 0.20.2 as 1. The maven pom files have cassandra depending on 0.20.2 2. The 0.20.2 default was murmur and we had to change it to random partitioner or it wouldn't work for us Ie. I suspect they will change the pom file to a more recent version of hadoop at some point but I

Re: cassandra 1.2.6 - Start key's token sorts after end token

2013-07-23 Thread Hiller, Dean
Oh, and in the past 0.20.x has been pretty stable by the wayŠ..they finally switched their numbering scheme thank god. Dean On 7/23/13 2:13 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Perhaps try 0.20.2 as 1. The maven pom files have cassandra depending on 0.20.2 2. The 0.20.2 default was

Re: Unable to describe table in CQL 3

2013-07-23 Thread Shahab Yunus
Rahul, See this as it was discussed earlier: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Representation-of-dynamically-added-columns-in-table-column-family-schema-using-cqlsh-td7588997.html Regards, Shahab On Tue, Jul 23, 2013 at 2:51 PM, Rahul Gupta

Re: Representation of dynamically added columns in table (column family) schema using cqlsh

2013-07-23 Thread Shahab Yunus
See this as this was discussed earlier: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Representation-of-dynamically-added-columns-in-table-column-family-schema-using-cqlsh-td7588997.html Regards, Shahab On Fri, Jul 12, 2013 at 11:13 AM, Shahab Yunus

Re: Decommission an entire DC

2013-07-23 Thread Omar Shibli
All you need to do is to decrease the replication factor of DC1 to 0, and then decommission the nodes one by one, I've tried this before and it worked with no issues. Thanks, On Tue, Jul 23, 2013 at 10:32 PM, Lanny Ripple la...@spotright.com wrote: Hi, We have a multi-dc setup using DC1:2,

unable to compact large rows

2013-07-23 Thread Paul Ingalls
I'm getting constant exceptions during compaction of large rows. In fact, I have not seen one work, even starting from an empty DB. As soon as I start pushing in data, when a row hits the large threshold, it fails compaction with this type of stack trace: INFO [CompactionExecutor:6]

Re: get all row keys of a table using CQL3

2013-07-23 Thread Blake Eggleston
Hi Jimmy, Check out the token function: http://www.datastax.com/docs/1.1/dml/using_cql#paging-through-non-ordered-partitioner-results You can use it to page through your rows. Blake On Jul 23, 2013, at 10:18 PM, Jimmy Lin wrote: hi, I want to fetch all the row keys of a table using CQL3:

Re: get all row keys of a table using CQL3

2013-07-23 Thread Jimmy Lin
hi Blake, arh okay, token function is nice. But I am still bit confused by the word page through all rows select id from mytable where token(id) token(12345) it will return all rows whose partition key's corresponding token that is 12345 ? I guess my question #1 still there, that does this