Millions of rows/items is no problem, megabytes per item is doable. Generally
people have talked about chunking blobs and storing them across multiple
columns.
See
http://wiki.apache.org/cassandra/LargeDataSetConsiderations
http://wiki.apache.org/cassandra/CassandraLimitations
Hope that
You will need to understand the possible range of key values your application
will create, and then split those up to balance the load around your cluster.
In general the RandomPartitioner is a good first step. Why are you going with
the ByteOrderedPartioner ?
Aaron
On 27 Jan 2011, at
The ArrayIndexOutOfBounds in the ReadStage looks like it can happen if a key is
not of the expected type. Could the comparator for the CF have changed ?
The error in the RequestResponseStage may be the race condition identified here
https://issues.apache.org/jira/browse/CASSANDRA-1959
Aaron
It will help if you can include the output from some of the tools, e.g.
nodetool ring
nodetool netstats
Aaron
On 27 Jan 2011, at 16:17, buddhasystem wrote:
Removetoken command just never returns. There is nothing streaming in the
cluster.
Anyone knows what might be happening?
Will it work for a billion rows? Because that's where eventually I'll end up
being.
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-for-storing-large-objects-tp5965418p5966284.html
Sent from the
--
Laradji nacer n.laradji at ovea dot com
Chef de Projets Systemes et Reseaux | Fondateur
oveahttp://www.ovea.com
Tél : +33 4 6767 Gsm : +33 6 1059 6883
The comparator has not changed.
Sent from my Android phone using TouchDown (www.nitrodesk.com)
-Original Message-
From: aaron morton [aa...@thelastpickle.com]
Received: Thursday, 27 Jan 2011, 1:10am
To: user@cassandra.apache.org [user@cassandra.apache.org]
Subject: Re: repair cause large
From: Sudhakar Mambakkam [mailto:mnsudha...@yahoo.com]
Sent: 27 January 2011 15:28
To: cassandra-u...@incubator.apache.org
Subject: unsubscribe
**
Meteor Mobile Communications Limited, trading as Meteor.
Registered Office: 1 Heuston South Quarter, St.
Maybe related to https://issues.apache.org/jira/browse/CASSANDRA-1992 ?
On Jan 27, 2011, at Thu Jan 27, 1:22 AM, B. Todd Burruss wrote:
i ran out of file handles on the repairing node after doing nodetool repair
- strange as i have never had this issue until using 0.7.0 (but i should say
thx, but i didn't do anything like removing/adding nodes. just did a nodetool
repair after running for an hour or so on a clean install
From: Matthew Conway [m...@backupify.com]
Sent: Thursday, January 27, 2011 8:17 AM
To: user@cassandra.apache.org
Hi all,
we're moving our enviroment to the new version of Cassandra (from the
0.6.8).
We're using the default configuration and we've enabled only the key-cache.
We've a strange behaviour about the writes performance: the codes are still
the same but the writes are from 4 to 6 times more slower.
On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss bburr...@real.com wrote:
thx, but i didn't do anything like removing/adding nodes. just did a
nodetool repair after running for an hour or so on a clean install
It affects anything that involves streaming.
-Brandon
ok thx. what about the repair creating hundreds of new sstables and
lsof showing cassandra using currently over 800 Data.db files? is this
normal?
On 01/27/2011 08:40 AM, Brandon Williams wrote:
On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss bburr...@real.com
mailto:bburr...@real.com wrote:
We have a 6 node Cassandra 0.6.8 cluster running on boxes with 4 GB of
RAM. Over the course of several weeks cached memory slowly decreases
until Cassandra is restarted or something bad happens (ie oom killer).
Performance obviously suffers as cached memory is no longer available.
Here is a graph
Unsubscribe
Thx
On Thu, Jan 27, 2011 at 11:17 AM, Matthew Conway m...@backupify.com wrote:
Maybe related to https://issues.apache.org/jira/browse/CASSANDRA-1992 ?
On Jan 27, 2011, at Thu Jan 27, 1:22 AM, B. Todd Burruss wrote:
i ran out of file handles on the repairing node after doing
I was reviewing the Lucandra schema presented on the below page at Datastax:
http://www.datastax.com/docs/0.7/data_model/lucandra
In the TermInfo Super Column Family, docID is the key for a supercolumn. Does
this imply that the maximum number of documents that can be index for a term
with
Lucene trades on (32-bit) ints internally, so I expect you're just seeing a
projection of that limitation.
On Jan 27, 2011, at 10:40 AM, David G. Boney wrote:
I was reviewing the Lucandra schema presented on the below page at Datastax:
http://www.datastax.com/docs/0.7/data_model/lucandra
Yes, but that's also the lucene limit
http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations
Lucene uses a Java int to refer to document numbers, and the index file
format uses an Int32
On Thu, Jan 27, 2011 at 1:40 PM, David G. Boney
dbon...@semanticartifacts.com wrote:
I was
When the destination node fails to open the streamed SSTable, we assume it
was corrupted during transfer, and retry the stream. Independent of the
exception posted above, it is a problem that the failed transfers were not
cleaned up.
How many of the data files are marked as -tmp-?
On Jan 27, 2011
[cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/data/Queues/*Data.db
| grep -c -v \-tmp\-
824
[cassandra@kv-app02 ~]$ ls -l
/data/cassandra-data/data/Queues/*-tmp-*Data.db | wc -l
829
[cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/data/Queues/*Comp* |
wc -l
247
On 01/27/2011 11:14
Using it for storing large immutable objects, like Aaron was suggesting we
are splitting the blob across multiple columns. Also we are reading it a few
columns at a time (for memory considerations). Currently we have only gone
upto about 300-400KB size objects.
We do have machines with 32Gb
I am new to Lucene and Lucandra.
My use case is that I have a trillion URIs to index with Lucene. Each URI is
either a resource or literal in an RDF graph. Each URI is a document for Lucene
If I were using Lucene, my understanding is that it would create a segment,
stuff as many URIs in the
I would ask myself a different question, which is what media-hosting sites
use (YouTube and all others). Cassandra still may have its usefulness here
as a mapper between a logical id and physical file location.
--
View this message in context:
The latest iteration of Lucandra, called Solandra, creates localized
sub-indexes of size N and spreads them around the cassandra ring. Then using
solr, will behind the scenes search all the subindexes in parallel. This
approach should give you what you need and it would be great to have such a
Thanks Anand. Few questions:
- What is the size of nodes (in terms for data)?
- How long have you been running?
- Howz compaction treating you?
Thanks,
Naren
On Thu, Jan 27, 2011 at 12:13 PM, Anand Somani meatfor...@gmail.com wrote:
Using it for storing large immutable objects, like Aaron was
http://wiki.apache.org/cassandra/FAQ#unsubscribeHow do I unsubscribe from the email list?Send an email touser-unsubscr...@cassandra.apache.orgOn 28 Jan, 2011,at 07:10 AM, Michael Poole mdpool...@gmail.com wrote:UnsubscribeThxOn Thu, Jan 27, 2011 at 11:17 AM, Matthew Conway m...@backupify.com
OK, after running repair and waiting overnight the rebalancing worked and
now 3 nodes share the load as I expected. However, one node that is broken
is still listed in the ring. I have no intention of reviving it. What's the
optimal way to get rid of it as far as the ring configuration is
I've decided to leave Weta Digital so I can spend more time working on and with
Cassandra. If you would like to hire me from mid March please contact me
directly on aa...@thelastpickle.com
I'm an Australian based in New Zealand and have skills in Python, Java, C#,
Cassandra and other No Sql's
Hi,
I'm trying to figure out what's going on with some column removes that don't
seem to be taking hold.
This particular test is being done on a single node cluster running 0.6.8
with CL=QUORUM on the writes (which shouldn't matter, I'd think).
What I'm seeing in our client log files is that a
No current issues with delete that I know of. To be safe can you upgrade to 0.7 or 0.6.10 ?Some things to check:- When you read the column back in the CLI does it have the timestamp you expected?- Until proven otherwise assume it's a client side thing, can you add some more logging into your app
At this point we are not in production, in the lab only. The longest test so
far has been about 2-3 days, the datasize at this point is about 2-3 TB per
node, we have 2 nodes. We do see spikes to high response times (and
timeouts), which seemed to be around the time GC kicks in. We were pushing
On Thu, Jan 27, 2011 at 2:05 AM, aaron morton aa...@thelastpickle.com wrote:
It will help if you can include the output from some of the tools, e.g.
nodetool ring
nodetool netstats
It will also help if you include the version of cassandra you are running.
=Rob
Thanks Anand. Let's keep exchanging our experiences.
-Naren
On Thu, Jan 27, 2011 at 8:50 PM, Anand Somani meatfor...@gmail.com wrote:
At this point we are not in production, in the lab only. The longest test
so far has been about 2-3 days, the datasize at this point is about 2-3 TB
per node,
36 matches
Mail list logo