Re: Using Cassandra for storing large objects

2011-01-27 Thread aaron morton
Millions of rows/items is no problem, megabytes per item is doable. Generally people have talked about chunking blobs and storing them across multiple columns. See http://wiki.apache.org/cassandra/LargeDataSetConsiderations http://wiki.apache.org/cassandra/CassandraLimitations Hope that

Re: Generating tokens for Cassandra cluster with ByteOrderedPartitioner

2011-01-27 Thread aaron morton
You will need to understand the possible range of key values your application will create, and then split those up to balance the load around your cluster. In general the RandomPartitioner is a good first step. Why are you going with the ByteOrderedPartioner ? Aaron On 27 Jan 2011, at

Re: repair cause large number of SSTABLEs

2011-01-27 Thread aaron morton
The ArrayIndexOutOfBounds in the ReadStage looks like it can happen if a key is not of the expected type. Could the comparator for the CF have changed ? The error in the RequestResponseStage may be the race condition identified here https://issues.apache.org/jira/browse/CASSANDRA-1959 Aaron

Re: Why does cassandra stream data when moving tokens?

2011-01-27 Thread aaron morton
It will help if you can include the output from some of the tools, e.g. nodetool ring nodetool netstats Aaron On 27 Jan 2011, at 16:17, buddhasystem wrote: Removetoken command just never returns. There is nothing streaming in the cluster. Anyone knows what might be happening?

Re: Using Cassandra for storing large objects

2011-01-27 Thread buddhasystem
Will it work for a billion rows? Because that's where eventually I'll end up being. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-for-storing-large-objects-tp5965418p5966284.html Sent from the

unsubcribe

2011-01-27 Thread laradji nacer
-- Laradji nacer n.laradji at ovea dot com Chef de Projets Systemes et Reseaux | Fondateur oveahttp://www.ovea.com Tél : +33 4 6767 Gsm : +33 6 1059 6883

Re: repair cause large number of SSTABLEs

2011-01-27 Thread Todd Burruss
The comparator has not changed. Sent from my Android phone using TouchDown (www.nitrodesk.com) -Original Message- From: aaron morton [aa...@thelastpickle.com] Received: Thursday, 27 Jan 2011, 1:10am To: user@cassandra.apache.org [user@cassandra.apache.org] Subject: Re: repair cause large

unsubscribe

2011-01-27 Thread Sudhakar Mambakkam

unsubscribe

2011-01-27 Thread Whitwham, Adrian
From: Sudhakar Mambakkam [mailto:mnsudha...@yahoo.com] Sent: 27 January 2011 15:28 To: cassandra-u...@incubator.apache.org Subject: unsubscribe ** Meteor Mobile Communications Limited, trading as Meteor. Registered Office: 1 Heuston South Quarter, St.

Re: repair cause large number of SSTABLEs

2011-01-27 Thread Matthew Conway
Maybe related to https://issues.apache.org/jira/browse/CASSANDRA-1992 ? On Jan 27, 2011, at Thu Jan 27, 1:22 AM, B. Todd Burruss wrote: i ran out of file handles on the repairing node after doing nodetool repair - strange as i have never had this issue until using 0.7.0 (but i should say

RE: repair cause large number of SSTABLEs

2011-01-27 Thread Todd Burruss
thx, but i didn't do anything like removing/adding nodes. just did a nodetool repair after running for an hour or so on a clean install From: Matthew Conway [m...@backupify.com] Sent: Thursday, January 27, 2011 8:17 AM To: user@cassandra.apache.org

unsubscribe

2011-01-27 Thread Vũ Anh Tuấn

Slow writes after migration to 0.7

2011-01-27 Thread Roberto Bentivoglio
Hi all, we're moving our enviroment to the new version of Cassandra (from the 0.6.8). We're using the default configuration and we've enabled only the key-cache. We've a strange behaviour about the writes performance: the codes are still the same but the writes are from 4 to 6 times more slower.

Re: repair cause large number of SSTABLEs

2011-01-27 Thread Brandon Williams
On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss bburr...@real.com wrote: thx, but i didn't do anything like removing/adding nodes. just did a nodetool repair after running for an hour or so on a clean install It affects anything that involves streaming. -Brandon

Re: repair cause large number of SSTABLEs

2011-01-27 Thread B. Todd Burruss
ok thx. what about the repair creating hundreds of new sstables and lsof showing cassandra using currently over 800 Data.db files? is this normal? On 01/27/2011 08:40 AM, Brandon Williams wrote: On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss bburr...@real.com mailto:bburr...@real.com wrote:

unsubcribe

2011-01-27 Thread Kirk Gilmore

reduced cached mem; resident set size growth

2011-01-27 Thread Chris Burroughs
We have a 6 node Cassandra 0.6.8 cluster running on boxes with 4 GB of RAM. Over the course of several weeks cached memory slowly decreases until Cassandra is restarted or something bad happens (ie oom killer). Performance obviously suffers as cached memory is no longer available. Here is a graph

unsubscribe

2011-01-27 Thread Michael Poole
Unsubscribe Thx On Thu, Jan 27, 2011 at 11:17 AM, Matthew Conway m...@backupify.com wrote: Maybe related to https://issues.apache.org/jira/browse/CASSANDRA-1992 ? On Jan 27, 2011, at Thu Jan 27, 1:22 AM, B. Todd Burruss wrote: i ran out of file handles on the repairing node after doing

Lucandra Limitations

2011-01-27 Thread David G. Boney
I was reviewing the Lucandra schema presented on the below page at Datastax: http://www.datastax.com/docs/0.7/data_model/lucandra In the TermInfo Super Column Family, docID is the key for a supercolumn. Does this imply that the maximum number of documents that can be index for a term with

Re: Lucandra Limitations

2011-01-27 Thread Paul Brown
Lucene trades on (32-bit) ints internally, so I expect you're just seeing a projection of that limitation. On Jan 27, 2011, at 10:40 AM, David G. Boney wrote: I was reviewing the Lucandra schema presented on the below page at Datastax: http://www.datastax.com/docs/0.7/data_model/lucandra

Re: Lucandra Limitations

2011-01-27 Thread Jake Luciani
Yes, but that's also the lucene limit http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations Lucene uses a Java int to refer to document numbers, and the index file format uses an Int32 On Thu, Jan 27, 2011 at 1:40 PM, David G. Boney dbon...@semanticartifacts.com wrote: I was

Re: repair cause large number of SSTABLEs

2011-01-27 Thread Stu Hood
When the destination node fails to open the streamed SSTable, we assume it was corrupted during transfer, and retry the stream. Independent of the exception posted above, it is a problem that the failed transfers were not cleaned up. How many of the data files are marked as -tmp-? On Jan 27, 2011

Re: repair cause large number of SSTABLEs

2011-01-27 Thread B. Todd Burruss
[cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/data/Queues/*Data.db | grep -c -v \-tmp\- 824 [cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/data/Queues/*-tmp-*Data.db | wc -l 829 [cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/data/Queues/*Comp* | wc -l 247 On 01/27/2011 11:14

Re: Using Cassandra for storing large objects

2011-01-27 Thread Anand Somani
Using it for storing large immutable objects, like Aaron was suggesting we are splitting the blob across multiple columns. Also we are reading it a few columns at a time (for memory considerations). Currently we have only gone upto about 300-400KB size objects. We do have machines with 32Gb

Re: Lucandra Limitations

2011-01-27 Thread David G. Boney
I am new to Lucene and Lucandra. My use case is that I have a trillion URIs to index with Lucene. Each URI is either a resource or literal in an RDF graph. Each URI is a document for Lucene If I were using Lucene, my understanding is that it would create a segment, stuff as many URIs in the

Re: Using Cassandra for storing large objects

2011-01-27 Thread buddhasystem
I would ask myself a different question, which is what media-hosting sites use (YouTube and all others). Cassandra still may have its usefulness here as a mapper between a logical id and physical file location. -- View this message in context:

Re: Lucandra Limitations

2011-01-27 Thread Jake Luciani
The latest iteration of Lucandra, called Solandra, creates localized sub-indexes of size N and spreads them around the cassandra ring. Then using solr, will behind the scenes search all the subindexes in parallel. This approach should give you what you need and it would be great to have such a

Re: Using Cassandra for storing large objects

2011-01-27 Thread Narendra Sharma
Thanks Anand. Few questions: - What is the size of nodes (in terms for data)? - How long have you been running? - Howz compaction treating you? Thanks, Naren On Thu, Jan 27, 2011 at 12:13 PM, Anand Somani meatfor...@gmail.com wrote: Using it for storing large immutable objects, like Aaron was

Re: unsubscribe

2011-01-27 Thread Aaron Morton
http://wiki.apache.org/cassandra/FAQ#unsubscribeHow do I unsubscribe from the email list?Send an email touser-unsubscr...@cassandra.apache.orgOn 28 Jan, 2011,at 07:10 AM, Michael Poole mdpool...@gmail.com wrote:UnsubscribeThxOn Thu, Jan 27, 2011 at 11:17 AM, Matthew Conway m...@backupify.com

Re: Node going down when streaming data, what next?

2011-01-27 Thread buddhasystem
OK, after running repair and waiting overnight the rebalancing worked and now 3 nodes share the load as I expected. However, one node that is broken is still listed in the ring. I have no intention of reviving it. What's the optimal way to get rid of it as far as the ring configuration is

Looking for Cassandra work.

2011-01-27 Thread aaron morton
I've decided to leave Weta Digital so I can spend more time working on and with Cassandra. If you would like to hire me from mid March please contact me directly on aa...@thelastpickle.com I'm an Australian based in New Zealand and have skills in Python, Java, C#, Cassandra and other No Sql's

Has anyone seen column deletes that seem not to actually delete the column?

2011-01-27 Thread Scott McCarty
Hi, I'm trying to figure out what's going on with some column removes that don't seem to be taking hold. This particular test is being done on a single node cluster running 0.6.8 with CL=QUORUM on the writes (which shouldn't matter, I'd think). What I'm seeing in our client log files is that a

Re: Has anyone seen column deletes that seem not to actually delete the column?

2011-01-27 Thread Aaron Morton
No current issues with delete that I know of. To be safe can you upgrade to 0.7 or 0.6.10 ?Some things to check:- When you read the column back in the CLI does it have the timestamp you expected?- Until proven otherwise assume it's a client side thing, can you add some more logging into your app

Re: Using Cassandra for storing large objects

2011-01-27 Thread Anand Somani
At this point we are not in production, in the lab only. The longest test so far has been about 2-3 days, the datasize at this point is about 2-3 TB per node, we have 2 nodes. We do see spikes to high response times (and timeouts), which seemed to be around the time GC kicks in. We were pushing

Re: Why does cassandra stream data when moving tokens?

2011-01-27 Thread Robert Coli
On Thu, Jan 27, 2011 at 2:05 AM, aaron morton aa...@thelastpickle.com wrote: It will help if you can include the output from some of the tools,  e.g. nodetool ring nodetool netstats It will also help if you include the version of cassandra you are running. =Rob

Re: Using Cassandra for storing large objects

2011-01-27 Thread Narendra Sharma
Thanks Anand. Let's keep exchanging our experiences. -Naren On Thu, Jan 27, 2011 at 8:50 PM, Anand Somani meatfor...@gmail.com wrote: At this point we are not in production, in the lab only. The longest test so far has been about 2-3 days, the datasize at this point is about 2-3 TB per node,