Re: nodetool repair: No neighbors

2011-07-31 Thread Sylvain Lebresne
On Sun, Jul 31, 2011 at 2:25 AM, Jason Baker ja...@apture.com wrote: When I run nodetool repair on a node on my 3-node cluster, I see 3 messages like the following:  INFO [manual-repair-6d9a617f-c496-4744-9002-a56909b83d5b] 2011-07-30 18:50:28,464 AntiEntropyService.java (line 636) No

Question about eventually consistent

2011-07-31 Thread Eldad Yamin
Hi, Let’s say that I have 2 datacenters, a key is changed on both of my datacenters in the exact same time (even in 1-2 seconds diff). Datacenter #1 remove a column and Datacenter #2 add 2 new columns. Is there any problem with consistency or Cassandra will handle this situation easily.

Re: Question about eventually consistent

2011-07-31 Thread Peter Schuller
Let’s say that I have 2 datacenters, a key is changed on both of my datacenters in the exact same time (even in 1-2 seconds diff). Datacenter #1 remove a column and Datacenter #2 add 2 new columns. Is there any problem with consistency or Cassandra will handle this situation easily. Columns

Re: Using Cassandra for transaction logging, good idea?

2011-07-31 Thread Peter Tillotson
You could try the following: i:20110728 { tx1=va1, tx2=va1, tx3=va1, tx4=va1, tx5=va1, tx6=va1, } The value could either be a blob / json pojo, or a reference off to another row storing the columns representing the value. Taking it further adding a

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu
any help? thanks! On Fri, Jul 29, 2011 at 12:05 PM, Yan Chunlu springri...@gmail.com wrote: and by the way, my RF=3 and the other two nodes have much more capacity, why does they always routed the request to node3? coud I do a rebalance now? before node repair? On Fri, Jul 29, 2011 at

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread mcasandra
First run nodetool move and then you can run nodetool repair. Before you run nodetool move you will need to determine tokens that each node will be responsible for. Then use that token to perform move. -- View this message in context:

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu
is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): - for x in xrange(nodes): - print 2 ** 127 / nodes * x On Mon, Aug 1, 2011 at 1:17 AM, mcasandra mohitanch...@gmail.com wrote: First run nodetool move and then you can run nodetool

Could I run node repair when disable gossip and thrift?

2011-07-31 Thread Yan Chunlu
I am running 3 nodes and RF=3, cassandra v0.7.4 seems when disablegossip and disablethrift could keep node in pretty low load. sometimes when the node repair doing rebuilding sstable, I would disable gossip and thrift to lower the load. not sure if I could disable them in the whole procedure.

Re: nodetool repair: No neighbors

2011-07-31 Thread Norman Maurer
I created an issue and attached a patch: https://issues.apache.org/jira/browse/CASSANDRA-2979 I was not sure if it would be better to handle it in NodeProbe or StorageService.. Bye, Norman 2011/7/31 Sylvain Lebresne sylv...@datastax.com: On Sun, Jul 31, 2011 at 2:25 AM, Jason Baker

Damaged commit log disk causes Cassandra client to get stuck

2011-07-31 Thread Lior Golan
In one of our test clusters we had a damaged commit log disks in one of the nodes. We have replication factor = 2 in this cluster, and write with consistency level = ONE. So we expected writes will not be affected by such an issue. But what actually happened is that the client that was writing

RE: Using Cassandra for transaction logging, good idea?

2011-07-31 Thread Lior Golan
How about using Snowflake to generate the transaction ids: https://github.com/twitter/snowflake From: Kent Narling [mailto:kent.narl...@gmail.com] Sent: Thursday, July 28, 2011 5:46 PM To: user@cassandra.apache.org Subject: Using Cassandra for transaction logging, good idea? Hi! I am

Re: Damaged commit log disk causes Cassandra client to get stuck

2011-07-31 Thread aaron morton
A couple of timeouts should have kicked in. First the rpc_timeout on the server side should have kicked in and given the client a (thrift) TimedOutException. Secondly a client side socket timeout should be set so the client will timeout the socket. Did either of these appear in the client

Re: Could I run node repair when disable gossip and thrift?

2011-07-31 Thread aaron morton
if you disable gossip the node will appear down to others. This would stop the repair starting. After repair has started it *may* still cause problems when new streams start (it probably does not). If the node is down other nodes will stop sending writes to it. disable thrift will stop

RE: Damaged commit log disk causes Cassandra client to get stuck

2011-07-31 Thread Lior Golan
Thanks Aaron. We will try to pull the logs and post them in this forum. But what I don't understand is why the client should pause at all. We are writing with CL.ONE, and the replication factor is 2. As far as we understand - the client communicates with a certain node (any node for that

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread aaron morton
aaron suggested it's better to run node repair on every node then re-balance it. That's me been cautious with other peoples data. It looks like node 3 is overwhelmed. Try getting the move sorted. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton

Re: Damaged commit log disk causes Cassandra client to get stuck

2011-07-31 Thread aaron morton
Yup, it sounds like things may not have failed as their should. Do you have a better definition of stuck ? Was the client waiting for a single request to completed or was the client not cycling to another node ? If there is some server log details out it may help understand what happened.

Re: Brisk and Hadoop question

2011-07-31 Thread aaron morton
You may have better luck on the brisk user group http://groups.google.com/group/brisk-users or IRC #datastax-brisk on freenode I would guess you can do a rolling upgrade to the existing nodes. But brisk has it's own snitch (BriskSimpleSnitch) so it may not be possible. Cheers

Re: Using Cassandra for transaction logging, good idea?

2011-07-31 Thread Kent Närling
Sounds interesting. Reading a bit on snowflake it seems a bit uncertain if it fulfills the A B criterias? ie: A, eventually return all known transactions B, Not return the same transaction more than once Also, any reflections on the general idea to use Cassandra like this? It would

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread mcasandra
springrider wrote: is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): - for x in xrange(nodes): - print 2 ** 127 / nodes * x Yes use that logic to get the tokens. I think it's safe to run move first and reair later. You are

Re: Brisk and Hadoop question

2011-07-31 Thread Jeremy Hanna
Check out http://wiki.apache.org/cassandra/HadoopSupport#ClusterConfig and that whole page to see an intro to configuring your cluster. Brisk extends these basic ideas. On Jul 31, 2011, at 12:31 PM, mcasandra wrote: Is it possible to add brisk nodes for analytics to already existing real

Re: How tokens work?

2011-07-31 Thread Rafael Almeida
On Saturday, July 30, 2011, Rafael Almeida almeida...@yahoo.com wrote: Hello,   I have computers that are better than others in my cluster. In special, there's one which is much better and I'd like to give it more load than the others.  Is it possible? I'm using RandomPartitioner, should I use

Re: Internal error processing get during bootstrap

2011-07-31 Thread Rafael Almeida
I'm going to tell you guys the answers I could find so far. On Tuesday, July 26, 2011, Rafael Almeida almeida...@yahoo.com wrote: I couldn't find much documentation regarding how to make a cluster, but it seemed simple enough. At cassandra server A (10.0.0.2) I had seeds: locahost. At server

Re: Using Cassandra for transaction logging, good idea?

2011-07-31 Thread aaron morton
If you are doing insert only it should be ok. If you want a unique and roughly ordered Tx id perhaps consider a TimeUUID in the first case, they are as ordered as the clocks generating the UUID's. Which is about as good as snowflake does, cannot remember what resolution the two use. Be aware

Re: How tokens work?

2011-07-31 Thread aaron morton
The recommended approach is for all nodes in a cassandra cluster to have the same HW spec. If the do not then you need to treat every node as having the lowest possible spec (i.e. the lowest memory, lowest CPU, lowest disk capacity and throughput). Other than during a HW upgrade, running mixed

Re: Internal error processing get during bootstrap

2011-07-31 Thread aaron morton
The wiki has info on setting up a cluster, see http://wiki.apache.org/cassandra/Operations and http://wiki.apache.org/cassandra/GettingStarted If get errors check the server side logs (/var/log/cassandra), also make sure that you are getting the exception raised by thrift. e.g.

Re: How tokens work?

2011-07-31 Thread Boris Yen
On Mon, Aug 1, 2011 at 8:24 AM, Rafael Almeida almeida...@yahoo.com wrote: On Saturday, July 30, 2011, Rafael Almeida almeida...@yahoo.com wrote: Hello, I have computers that are better than others in my cluster. In special, there's one which is much better and I'd like to give it more

Re: Could I run node repair when disable gossip and thrift?

2011-07-31 Thread Yan Chunlu
okay, I see. thanks a lot for the help! On Mon, Aug 1, 2011 at 5:26 AM, aaron morton aa...@thelastpickle.comwrote: if you disable gossip the node will appear down to others. This would stop the repair starting. After repair has started it *may* still cause problems when new streams start (it

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu
okay, thanks Aaron! On Mon, Aug 1, 2011 at 5:43 AM, aaron morton aa...@thelastpickle.comwrote: aaron suggested it's better to run node repair on every node then re-balance it. That's me been cautious with other peoples data. It looks like node 3 is overwhelmed. Try getting the move

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu
thanks a lot! I will try the move. On Mon, Aug 1, 2011 at 7:07 AM, mcasandra mohitanch...@gmail.com wrote: springrider wrote: is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): - for x in xrange(nodes): - print 2 ** 127

Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread myreasoner
Hi, my read latency is really horrible and I can't figure out what went wrong. I'm running cassandra 0.8.0 on a 5 machine cluster. The Fingerprint ColumnFamily has 400,000 rows, each row has about 4,000 Super columns, and each super column has 1 to 4 columns. One row looks like: RowKey: 00c26f

Re: Secondary index on composite columns?

2011-07-31 Thread Jonathan Ellis
Sure, but it's still only useful for equality predicates. On Sun, Jul 31, 2011 at 8:50 PM, Boris Yen yulin...@gmail.com wrote: Hi, I was wondering if anyone would know if secondary index can be enabled on composite columns? Regards Boris -- Jonathan Ellis Project Chair, Apache Cassandra

Re: Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread Teijo Holzer
Hi, try running a major compaction via nodetool on this Column family. The number of SSTables seems quite large. Considering the space used, this might take a few hours and might also impact performance. Cheers, T. On 01/08/11 14:23, myreasoner wrote: Hi, my read latency is

Re: Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread myreasoner
If I do ./nodetool -h localhost compact keyspace columnfamily1 it will go out and compact coumnfamily1 on all the nodes not just the localhost, correct? -- View this message in context:

Re: Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread Teijo Holzer
Compaction is machine-local, you need to run it on every node. Do it as a rolling compaction (or in parallel if you can take the performance hit). Cheers, T. On 01/08/11 15:31, myreasoner wrote: If I do ./nodetool -h localhost compact keyspace columnfamily1 it will go out and

Re: Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread myreasoner
Thanks. I did *./nodetool -h localhost compact keyspace columnfamily1 *. But it came back really quick and the cfstats doesn't seem change much. After compaction: Column Family: Fingerprint SSTable count: 2057 Space used (live): 164351343468

Re: Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread Teijo Holzer
Hi, try nodetool -h localhost compact check progress with nodetool -h localhost compactionstats and check system.log Cheers, T. On 01/08/11 15:47, myreasoner wrote: Thanks. I did *./nodetool -h localhost compact keyspace columnfamily1 *. But it came back really quick and the

Re: Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread Mina Naguib
Did you run that verbatim ? Or you appropriately substituted keyspace and columnfamily1 ? Also, anything in cassandra's log file (system.log) ? Compacting 150Gb over 2057 SSTables should take a reasonable bit of time... On 2011-07-31, at 11:47 PM, myreasoner wrote: Thanks. I did

Re: Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread myreasoner
On the node that the compaction returned almost immediately: *woot@n50:~$ /opt/cassandra/bin/nodetool -h localhost compactionstats pending tasks: 66* However, messages shown on other nodes are: compaction type: Major keyspace: MyKeyspace column family: Fingerprint bytes compacted: 25505066421

Re: Read latency is over 1 minute on a column family with 400,000 rows

2011-07-31 Thread Teijo Holzer
Looks like a broken node, just restart Cassandra on that node. Might want to wait for the compaction to finish on the other nodes. Also, don't forget to JMX gc() manually after the compaction has finished to delete the files on each node. On 01/08/11 16:29, myreasoner wrote: On the node