Re: Taking a Cluster Wide Snapshot

2012-04-26 Thread Deno Vichas
On 4/25/2012 11:34 PM, Shubham Srivastava wrote: Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same. I am using a MultiDC setup with cassandra 0.8.6. Regards, Shubham here's how i'm doing in AWS land using the

RE: Taking a Cluster Wide Snapshot

2012-04-26 Thread Shubham Srivastava
Thanks a Lot Deno. A bit surprised that the an equivalent command should be there with nodetool. Not sure if it is in the latest release. BTW this makes a prerequisite that all the Data files of Cassandra be it index or filters etc will have unique names across cluster. Is this a reasoanble

Re: Taking a Cluster Wide Snapshot

2012-04-26 Thread Deno Vichas
there's no prerequisite for unique names. each node's snapshot gets tar'ed up and then copied over to a directory the name of the hostname of the node. then those dirs are tar'ed and copied to S3. what i haven't tried yet is to untar everything for all nodes into a single node cluster. i'm

Re: Taking a Cluster Wide Snapshot

2012-04-26 Thread Shubham Srivastava
Your second part was what I was also referring where I put all the files from nodes to a single node to create a similar bkp which needs to have unique file names across cluster. From: Deno Vichas [mailto:d...@syncopated.net] Sent: Thursday, April 26, 2012 12:29 PM To:

Question regarding major compaction.

2012-04-26 Thread Fredrik
In the tuning documentation regarding Cassandra, it's recomended not to run major compactions. I understand what a major compaction is all about but I'd like an in depth explanation as to why reads will continually degrade until the next major compaction is manually invoked. From the doc: So

Re: Question regarding major compaction.

2012-04-26 Thread Ji Cheng
I'm also quite interested in this question. Here's my understanding on this problem. 1. If your workload is append-only, doing a major compaction shouldn't affect the read performance too much, because each row appears in one sstable anyway. 2. If your workload is mostly updating existing rows,

Re: Question regarding major compaction.

2012-04-26 Thread Fredrik
Exactly, but why would reads be significantly slower over time when including just one more, although sometimes large, SSTable in the read? Ji Cheng skrev 2012-04-26 11:11: I'm also quite interested in this question. Here's my understanding on this problem. 1. If your workload is

Maintain sort order on updatable property and pagination

2012-04-26 Thread Rajat Mathur
Hi All, I am using property of columns i.e., they are in sorted order to store sort orders (I believe everyone else is also using the same). But if I want to maintain sort order on a property, whose value changes, I would have to perform read and delete operation. Is there a better way to solve

Re: nodetool repair hanging

2012-04-26 Thread Bill Au
My cluster is very small (300 MB) and compact was taking more than 2 hours. I ended up bouncing all the nodes. After that, I was able to run repair on all nodes, and each one takes less than a minute. If this happens again I will be sure to run compactionstats and netstats. Thanks for that

Data model question, storing Queue Message

2012-04-26 Thread Morgan Segalis
Hi everyone ! I'm fairly new to cassandra and I'm not quite yet familiarized with column oriented NoSQL model. I have worked a while on it, but I can't seems to find the best model for what I'm looking for. I have a Erlang software that let user connecting and communicate with each others,

RE: Taking a Cluster Wide Snapshot

2012-04-26 Thread Shubham Srivastava
I was trying to get hold of all the data kind of a global snapshot. I did the below : I copied all the snapshots from each individual nodes where the snapshot data size was around 12Gb on each node to a common folder(one folder alone). Strangely I found duplicate file names in multiple

Re: Taking a Cluster Wide Snapshot

2012-04-26 Thread Rob Coli
I copied all the snapshots from each individual nodes where the snapshot data size was around 12Gb on each node to a common folder(one folder alone). Strangely I found duplicate file names in multiple snapshots and more strangely the data size was different of each duplicate file which lead

Re: user Digest of: get.23021

2012-04-26 Thread Frank Ng
I am having the same issue in 1.0.7 with leveled compation. It seems that the repair is flaky. It either completes relatively fast in a TEST environment (7 minutes) or gets stuck trying to receive a merkle tree from a peer that is already sending it the merkle tree. Only solution is to restart

Re: repair waiting for something

2012-04-26 Thread Frank Ng
I am having the same issue in 1.0.7 with leveled compation. It seems that the repair is flaky. It either completes relatively fast in a TEST environment (7 minutes) or gets stuck trying to receive a merkle tree from a peer that is already sending it the merkle tree. Only solution is to restart

Is this possible.

2012-04-26 Thread Ed Jone
Hello, I am new to cassandra and was hoping if someone can tell me if the following is possible. Given I have a columnfamily with a list of users in each Row. Each user has the properties: name, highscore, x, y, z. I want to use name as the column key, but I want the columns to be sorted by

Re: Is this possible.

2012-04-26 Thread Data Craftsman
Data model: REM CQL 3.0 $ cqlsh --cql3 drop COLUMNFAMILY user_score_v3; CREATE COLUMNFAMILY user_score_v3 (name varchar, highscore float, x int, y varchar, z varchar, PRIMARY KEY (name, highscore) ); DML is as usual, as commom, as RDBMS SQL. Query: Top 3, SELECT name, highscore,

Re: Is this possible.

2012-04-26 Thread Data Craftsman
DML example, insert into user_score_v3(name, highscore, x,y,z) values ('abc', 299.76, 1001, '*', '*'); ... 2012/4/26 Data Craftsman database.crafts...@gmail.com: Data model: REM CQL 3.0 $ cqlsh --cql3 drop COLUMNFAMILY user_score_v3; CREATE COLUMNFAMILY user_score_v3 (name

Node join streaming stuck at 100%

2012-04-26 Thread Bryce Godfrey
This is the second node I've joined to my cluster in the last few days, and so far both have become stuck at 100% on a large file according to netstats. This is on 1.0.9, is there anything I can do to make it move on besides restarting Cassandra? I don't see any errors or warns in logs for

Map reduce without hdfs

2012-04-26 Thread ruslan usifov
Hello to all! It it possible to launch only hadoop mapreduce task tracker and job tracker against cassandra cluster, and doesn't launch HDFS (use for shared storage something else)?? Thanks

Re: Map reduce without hdfs

2012-04-26 Thread Edward Capriolo
That is one of the perks of brisk and later datastax enterprise. As it stands the datanode component is only used as a distributed cache for jars. So if your job uses cassandra for input format and output format you only need the other components for temporary storage. On Thu, Apr 26, 2012 at

RE: Taking a Cluster Wide Snapshot

2012-04-26 Thread Shubham Srivastava
Thanks a lot Rob. On another thought I could also try copying the data of my keyspace alone from one node to another node in the new cluster (I have both the old and new clusters having same nodes DC1:6,DC2:6 with same tokens) with the same tokens. Would there be any risk of the new cluster

Re: Taking a Cluster Wide Snapshot

2012-04-26 Thread Rob Coli
On Thu, Apr 26, 2012 at 10:38 PM, Shubham Srivastava shubham.srivast...@makemytrip.com wrote: On another thought I could also try copying the data of my keyspace alone from one node to another node in the new cluster (I have both the old and new clusters having same nodes DC1:6,DC2:6 with