Re: Passing client as parameter

2010-06-10 Thread Ran Tavory
You can look at http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/CassandraClientFactory.java so, to close the client you can just get the transport out of the client (bold): private void closeClient(CassandraClient cclient) { log.debug(Closing

Granularity SSTables.

2010-06-10 Thread xavier manach
Hi. I try to understand tricks that I can use with the SSTables, for faster manipulation of datas in clusters. I learn I how copy a keyspaces from data directories to a new node and change replicationfactor (thx Jonathan). If I understood, Each SSTable have 3 files :

single node capacity

2010-06-10 Thread hive13 Wong
Hi, How much data load can a single typical cassandra instance handle? It seems like we are getting into trouble when one of our node's load grows to bigger than 200g. Both read latency and write latency are increasing, varying from 10 to several thousand milliseconds. machine config is 16*cpu

Best way of adding new nodes

2010-06-10 Thread hive13 Wong
Hi, guys The 2 ways of adding new nodes, when add with bootstrapping, since we've already got lots of data, often it will take many hours to complete the bootstrapping and probably affect the performance of existing nodes. But if we add without bootstrapping, the data load on the new node could

keyrange for get_range_slices

2010-06-10 Thread Dop Sun
Hi, As documented in the http://wiki.apache.org/cassandra/API, the key range for get_range_slices are both inclusive. As discussed in this thread: http://groups.google.com/group/jassandra-user/browse_thread/thread/c2e56453c de067d3, there is a case that user want to discover all keys

Re: keyrange for get_range_slices

2010-06-10 Thread Philip Stanhope
No ... and I personally don't have a problem with this if you think about what is actually going on under the covers. Note, however, that this is an expensive operation and as a result if there are parallel updates to the indexes while you are performing a full keyscan (rowscan) you will

Running Cassandra as a Windows Service

2010-06-10 Thread Kochheiser,Todd W - TO-DITT1
For various reasons I am required to deploy systems on Windows. As such, I went looking for information on running Cassandra as a Windows service. I've read some of the user threads regarding running Cassandra as a Windows service, such as this one:

Re: Running Cassandra as a Windows Service

2010-06-10 Thread Ben Standefer
For various reasons I am required to deploy systems on Windows. I don't think it would be difficult to argue the business case for running Cassandra on Linux. It's still a young project and everybody in IRC and the mailing list is running it on Linux. You should really re-think whatever factors

read operation is slow

2010-06-10 Thread Caribbean410
Hello, I am testing the performance of cassandra. We write 200k records to database and each record is 1k size. Then we read these 200k records. It takes more than 400s to finish the read which is much slower than mysql (20s around). I read some discussion online and someone suggest to make

Re: Best way of adding new nodes

2010-06-10 Thread Jonathan Ellis
It's not just a matter of being balanced, if you add new nodes without bootstrapping the others will think it has data on it, that hasn't actually been moved there. On Thu, Jun 10, 2010 at 6:50 AM, hive13 Wong hiv...@gmail.com wrote: Hi, guys The 2 ways of adding new nodes, when add with

Re: cassandra out of heap space crash

2010-06-10 Thread Ran Tavory
I can't say exactly how much memory is the correct amount, but surely 1G is very little. By replicating 3 times your cluster now makes 3 times more work than it used to do, both on reads and on writes while the readers/writers continue hammering it the same pace. So once you've upped your memory

Re: scans stopped returning values for some keys

2010-06-10 Thread Jonathan Ellis
How is your CF defined? (what comparator?) did you try start=empty byte array instead of Long.MAX_VALUE? On Wed, Jun 9, 2010 at 8:06 AM, Pawel Dabrowski pa...@reviewpro.com wrote: Hi, I'm using Cassandra to store some aggregated data in a structure like this: KEY - product_id SUPER COLUMN

cassandra out of heap space crash

2010-06-10 Thread Julie
I am running an 8 node cassandra cluster with each node on its own dedicated VM. My app very quickly populates the database with about 100,000 rows of data (each row is about 100K bytes) times the number of nodes in my cluster so there's about 100,000 rows of data on each node (seems very evenly

RE: Running Cassandra as a Windows Service

2010-06-10 Thread Kochheiser,Todd W - TO-DITT1
I agree that bitrot might be happen if all of the core Cassandra developers are using Linux. Your suggestion of putting things in a contrib area where curious (or desperate) parties suffering on the Windows platform could pick it up seems like a reasonable place to start. It might also be an

RE: keyrange for get_range_slices

2010-06-10 Thread Dop Sun
Thanks for your quick and detailed explain on the key scan. This is really helpful! Dop From: Philip Stanhope [mailto:pstanh...@wimba.com] Sent: Thursday, June 10, 2010 10:40 PM To: user@cassandra.apache.org Subject: Re: keyrange for get_range_slices No ... and I personally don't have

Re: Granularity SSTables.

2010-06-10 Thread Jonathan Ellis
Only if your clusters have the same number of nodes, with the same tokens. Trying to get too clever is not usually advisable. On Thu, Jun 10, 2010 at 3:54 AM, xavier manach x...@tekio.org wrote: Hi.  I try to understand tricks that I can use with the SSTables, for faster manipulation of

Re: File Descriptor leak

2010-06-10 Thread Jonathan Ellis
Fixed in https://issues.apache.org/jira/browse/CASSANDRA-1178 On Thu, Jun 10, 2010 at 9:01 AM, Matt Conway m...@backupify.com wrote: Hi All, I'm running a small 4-node cluster with minimal load using the 2010-06-08_12-31-16 build from trunk, and its exhausting file descriptors pretty quickly

RE: Range Slices timing question

2010-06-10 Thread Carlos Sanchez
Thx a lot -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, June 10, 2010 4:28 PM To: user@cassandra.apache.org Subject: Re: Range Slices timing question get_range_slices is faster in 0.7 but there's not much you can do in 0.6. On Wed, Jun 9, 2010 at

Cassandra Write Performance, CPU usage

2010-06-10 Thread Rishi Bhardwaj
Hi I am investigating Cassandra write performance and see very heavy CPU usage from Cassandra. I have a single node Cassandra instance running on a dual core (2.66 Ghz Intel ) Ubuntu 9.10 server. The writes to Cassandra are being generated from the same server using BatchMutate(). The client

Re: Cassandra Write Performance, CPU usage

2010-06-10 Thread vd
Hi Rishi The writes in Cassandra are not directly written to the Disk, they are stored in memory and later on flushed to the disk. May be thats why you are not getting much out of iostat. Cant say about high cpu usage. ___ Vineet Daniel

Re: Cassandra Write Performance, CPU usage

2010-06-10 Thread Rishi Bhardwaj
Hi Jonathan Thanks for such an informative reply. My application may end up doing such continuous bulk writes to Cassandra and thus I was interested in such a performance case. I was wondering as to what are all the CPU overheads for each row/column written to Cassandra? You mentioned updating

Re: Cassandra Write Performance, CPU usage

2010-06-10 Thread Jonathan Shook
Rishi, I am not yet knowledgeable enough to answer your question in more detail. I would like to know more about the specifics as well. There are counters you can use via JMX to show logical events, but this will not always translate to good baseline information that you can use in scaling