Re: Read time get worse during dynamic snitch reset
On Tue, Apr 12, 2011 at 12:26 AM, aaron morton aa...@thelastpickle.comwrote: The reset interval clears the latency tracked for each node so a bad node will be read from again. The scores for each node are then updated every 100ms (default) using the last 100 responses from a node. How long does the bad performance last for? Only a few seconds and but there are a lot of read requests during this time What CL are you reading at ? At Quorum with RF 4 the read request will be sent to 3 nodes, ordered by proximity and wellness according to the dynamic snitch. (for background recent discussion on dynamic snitch http://www.mail-archive.com/user@cassandra.apache.org/msg12089.html) I am reading with CL of ONE, read_repair_chance=0.33, RackInferringSnitch and keys_cached = rows_cached = 0 You can take a look at the weights and timings used by the DynamicSnitch in JConsole under o.a.c.db.DynamicSnitchEndpoint . Also at DEBUG log level you will be able to see which nodes the request is sent to. Everything looks OK. The weights are around 3 for the nodes in the same data center and around 5 for the others. I will turn on the DEBUG level to see if I can find more info. My guess is the DynamicSnitch is doing the right thing and the slow down is a node with a problem getting back into the list of nodes used for your read. It's then moved down the list as it's bad performance is noticed. Looking the DynamicSnitch MBean I don't see any problems with any of the nodes. My guess is that during the reset time there are reads that are sent to the other data center. Hope that helps Aaron Shimi On 12 Apr 2011, at 01:28, shimi wrote: I finally upgraded 0.6.x to 0.7.4. The nodes are running with the new version for several days across 2 data centers. I noticed that the read time in some of the nodes increase by x50-60 every ten minutes. There was no indication in the logs for something that happen at the same time. The only thing that I know that is running every 10 minutes is the dynamic snitch reset. So I changed dynamic_snitch_reset_interval_in_ms to 20 minutes and now I have the problem once in every 20 minutes. I am running all nodes with: replica_placement_strategy: org.apache.cassandra.locator.NetworkTopologyStrategy strategy_options: DC1 : 2 DC2 : 2 replication_factor: 4 (DC1 and DC2 are taken from the ips) Does anyone familiar with this kind of behavior? Shimi
Re: Cassandra constantly nodes which doens allredy exists
2011/4/12 aaron morton aa...@thelastpickle.com In JConsole go to o.a.c.db.HintedHandoffManager and try the deleteHintsForEndpopints operation. This is also called as when a token is removed from the ring, or when a node is decomissioned. What process did you use to reconfigure the cluster? I decommission node, then step by step restart all nodes in clustrer. When I repeat restart operation twice this LOG entry disappear
Re: problems getting started with Cassandra Ruby
Hello Mark, Disable verbose mode (-w or $VERBOSE) of ruby. Or, you can cleanup ruby thrift library by yourself. 2011/4/12 Mark Lilback mlilb...@stat.wvu.edu: I'm trying to connect to Cassandra from a Ruby script. I'm using rvm, and made a clean install of Ruby 1.9.2 and then did gem install cassandra. When I run a script that just contains require 'cassandra/0.7', I get the output below. Any suggestion on what I need to do to get rid of these warnings? /Users/admin/.rvm/gems/ruby-1.9.2-p180/gems/thrift-0.5.0/lib/thrift/server/nonblocking_server.rb:80: warning: `' interpreted as argument prefix /Users/admin/.rvm/gems/ruby-1.9.2-p180/gems/thrift-0.5.0/lib/thrift_native.bundle: warning: method redefined; discarding old skip /Users/admin/.rvm/gems/ruby-1.9.2-p180/gems/thrift-0.5.0/lib/thrift/protocol/base_protocol.rb:235: warning: previous definition of skip was here (snip) -- Mark Lilback West Virginia University Department of Statistics mlilb...@stat.wvu.edu -- w3m
Questions about the nodetool ring.
I have 3 cassandra 0.7.4 nodes in a cluster, and I get the ring stats: [root@yun-phy2 apache-cassandra-0.7.4]# bin/nodetool -h 192.168.1.28 -p 8090 ring Address Status State LoadOwnsToken 109028275973926493413574716008500203721 192.168.1.25Up Normal 157.25 MB 69.92% 57856537434773737201679995572503935972 192.168.1.27Up Normal 201.71 MB 24.28% 99165710459060760249270263771474737125 192.168.1.28Up Normal 68.12 MB5.80% 109028275973926493413574716008500203721 The load and owns vary on each node, is this normal? And is there a way to balance the three nodes? Thanks. -- Dikang Gu 0086 - 18611140205
Re: Questions about the nodetool ring.
This is normal when you just add single nodes. When no token is assigned, the new node takes a portion of the ring from the most heavily loaded node. As a consequence of this, the nodes will be out of balance. In other words, when you double the amount nodes you would not have this problem. The best way to rebalance the cluster is to generate new tokens and use the nodetool move new-token command to rebalance the nodes, one at a time. After rebalancing you can run cleanup so the nodes get rid of data they no longer are responsible for. links: http://wiki.apache.org/cassandra/Operations#Range_changes http://wiki.apache.org/cassandra/Operations#Moving_or_Removing_nodes http://www.datastax.com/docs/0.7/operations/clustering#adding-capacity On Apr 12, 2011, at 11:00 AM, Dikang Gu wrote: I have 3 cassandra 0.7.4 nodes in a cluster, and I get the ring stats: [root@yun-phy2 apache-cassandra-0.7.4]# bin/nodetool -h 192.168.1.28 -p 8090 ring Address Status State LoadOwnsToken 109028275973926493413574716008500203721 192.168.1.25Up Normal 157.25 MB 69.92% 57856537434773737201679995572503935972 192.168.1.27Up Normal 201.71 MB 24.28% 99165710459060760249270263771474737125 192.168.1.28Up Normal 68.12 MB5.80% 109028275973926493413574716008500203721 The load and owns vary on each node, is this normal? And is there a way to balance the three nodes? Thanks. -- Dikang Gu 0086 - 18611140205
Re: Questions about the nodetool ring.
The 3 nodes were added to the cluster at the same time, so I'm not sure whey the data vary. I calculate the tokens and get: node 0: 0 node 1: 56713727820156410577229101238628035242 node 2: 113427455640312821154458202477256070485 So I should set these tokens to the three nodes? And during the time I execute the nodetool move commands, can the cassandra servers serve the front end requests at the same time? Is the data safe? Thanks. On Tue, Apr 12, 2011 at 5:15 PM, Jonathan Colby jonathan.co...@gmail.comwrote: This is normal when you just add single nodes. When no token is assigned, the new node takes a portion of the ring from the most heavily loaded node.As a consequence of this, the nodes will be out of balance. In other words, when you double the amount nodes you would not have this problem. The best way to rebalance the cluster is to generate new tokens and use the nodetool move new-token command to rebalance the nodes, one at a time. After rebalancing you can run cleanup so the nodes get rid of data they no longer are responsible for. links: http://wiki.apache.org/cassandra/Operations#Range_changes http://wiki.apache.org/cassandra/Operations#Moving_or_Removing_nodes http://www.datastax.com/docs/0.7/operations/clustering#adding-capacity On Apr 12, 2011, at 11:00 AM, Dikang Gu wrote: I have 3 cassandra 0.7.4 nodes in a cluster, and I get the ring stats: [root@yun-phy2 apache-cassandra-0.7.4]# bin/nodetool -h 192.168.1.28 -p 8090 ring Address Status State LoadOwnsToken 109028275973926493413574716008500203721 192.168.1.25Up Normal 157.25 MB 69.92% 57856537434773737201679995572503935972 192.168.1.27Up Normal 201.71 MB 24.28% 99165710459060760249270263771474737125 192.168.1.28Up Normal 68.12 MB5.80% 109028275973926493413574716008500203721 The load and owns vary on each node, is this normal? And is there a way to balance the three nodes? Thanks. -- Dikang Gu 0086 - 18611140205 -- Dikang Gu 0086 - 18611140205
Re: Questions about the nodetool ring.
After the nodetool move, I got this: [root@server3 apache-cassandra-0.7.4]# bin/nodetool -h 10.18.101.213 ring Address Status State LoadOwnsToken 113427455640312821154458202477256070485 10.18.101.211 ? Normal 82.31 MB33.33% 0 10.18.101.212 ? Normal 84.24 MB33.33% 56713727820156410577229101238628035242 10.18.101.213 Up Normal 54.44 MB33.33% 113427455640312821154458202477256070485 Is this correct? Why is the status ? ? Thanks. On Tue, Apr 12, 2011 at 5:43 PM, Dikang Gu dikan...@gmail.com wrote: The 3 nodes were added to the cluster at the same time, so I'm not sure whey the data vary. I calculate the tokens and get: node 0: 0 node 1: 56713727820156410577229101238628035242 node 2: 113427455640312821154458202477256070485 So I should set these tokens to the three nodes? And during the time I execute the nodetool move commands, can the cassandra servers serve the front end requests at the same time? Is the data safe? Thanks. On Tue, Apr 12, 2011 at 5:15 PM, Jonathan Colby jonathan.co...@gmail.comwrote: This is normal when you just add single nodes. When no token is assigned, the new node takes a portion of the ring from the most heavily loaded node.As a consequence of this, the nodes will be out of balance. In other words, when you double the amount nodes you would not have this problem. The best way to rebalance the cluster is to generate new tokens and use the nodetool move new-token command to rebalance the nodes, one at a time. After rebalancing you can run cleanup so the nodes get rid of data they no longer are responsible for. links: http://wiki.apache.org/cassandra/Operations#Range_changes http://wiki.apache.org/cassandra/Operations#Moving_or_Removing_nodes http://www.datastax.com/docs/0.7/operations/clustering#adding-capacity On Apr 12, 2011, at 11:00 AM, Dikang Gu wrote: I have 3 cassandra 0.7.4 nodes in a cluster, and I get the ring stats: [root@yun-phy2 apache-cassandra-0.7.4]# bin/nodetool -h 192.168.1.28 -p 8090 ring Address Status State LoadOwnsToken 109028275973926493413574716008500203721 192.168.1.25Up Normal 157.25 MB 69.92% 57856537434773737201679995572503935972 192.168.1.27Up Normal 201.71 MB 24.28% 99165710459060760249270263771474737125 192.168.1.28Up Normal 68.12 MB5.80% 109028275973926493413574716008500203721 The load and owns vary on each node, is this normal? And is there a way to balance the three nodes? Thanks. -- Dikang Gu 0086 - 18611140205 -- Dikang Gu 0086 - 18611140205 -- Dikang Gu 0086 - 18611140205
Unsubscribe
On Apr 12, 2011 5:01 AM, Dikang Gu dikan...@gmail.com wrote: I have 3 cassandra 0.7.4 nodes in a cluster, and I get the ring stats: [root@yun-phy2 apache-cassandra-0.7.4]# bin/nodetool -h 192.168.1.28 -p 8090 ring Address Status State Load Owns Token 109028275973926493413574716008500203721 192.168.1.25 Up Normal 157.25 MB 69.92% 57856537434773737201679995572503935972 192.168.1.27 Up Normal 201.71 MB 24.28% 99165710459060760249270263771474737125 192.168.1.28 Up Normal 68.12 MB 5.80% 109028275973926493413574716008500203721 The load and owns vary on each node, is this normal? And is there a way to balance the three nodes? Thanks. -- Dikang Gu 0086 - 18611140205
Re: Timeout during stress test
Couple of hits here, one from jonathan and some previous discussions on the user list http://www.google.co.nz/search?q=cassandra+iostat Same here for cfhistograms http://www.google.co.nz/search?q=cassandra+cfhistograms cfhistograms includes information on the number of sstables read during recent requests. As your initial cfstats showed 236 sstables I thought it may be useful see if there was a high number of sstables been accessed per read. 70 requests per second is slow against a 6 node cluster where each node has 12 cores and 96GB of ram. Something is not right. Aaron On 12 Apr 2011, at 17:11, mcasandra wrote: aaron morton wrote: You'll need to provide more information, from the TP stats the read stage could not keep up. If the node is not CPU bound then it is probably IO bound. What sort of read? How many columns was it asking for ? How many columns do the rows have ? Was the test asking for different rows ? How many ops requests per second did it get up to? What do the io stats look like ? What does nodetool cfhistograms say ? It's simple read of 1M rows with one column of avg size of 200K. Got around 70 req per sec. Not sure how to intepret the iostats output with things happening async in cassandra. Can you give little description on how to interpret it? I have posted output of cfstats. Does cfhistograms provide better info? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Timeout-during-stress-test-tp6262430p6263859.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Questions about the nodetool ring.
when you do a move, the node is decommissioned and bootstrapped. During the autobootstrap process the node will not receive reads until bootstrapping is complete. I assume during the decommission phase the node will also be unavailable, someone correct me if I'm wrong. the ring distribution looks better now. The ? I get all the time too. And if you run ring against different hosts, the question marks probably appear in different places. I'm not sure if it means there is a problem. I haven't taken those question marks too seriously. On Apr 12, 2011, at 11:57 AM, Dikang Gu wrote: After the nodetool move, I got this: [root@server3 apache-cassandra-0.7.4]# bin/nodetool -h 10.18.101.213 ring Address Status State LoadOwnsToken 113427455640312821154458202477256070485 10.18.101.211 ? Normal 82.31 MB33.33% 0 10.18.101.212 ? Normal 84.24 MB33.33% 56713727820156410577229101238628035242 10.18.101.213 Up Normal 54.44 MB33.33% 113427455640312821154458202477256070485 Is this correct? Why is the status ? ? Thanks. On Tue, Apr 12, 2011 at 5:43 PM, Dikang Gu dikan...@gmail.com wrote: The 3 nodes were added to the cluster at the same time, so I'm not sure whey the data vary. I calculate the tokens and get: node 0: 0 node 1: 56713727820156410577229101238628035242 node 2: 113427455640312821154458202477256070485 So I should set these tokens to the three nodes? And during the time I execute the nodetool move commands, can the cassandra servers serve the front end requests at the same time? Is the data safe? Thanks. On Tue, Apr 12, 2011 at 5:15 PM, Jonathan Colby jonathan.co...@gmail.com wrote: This is normal when you just add single nodes. When no token is assigned, the new node takes a portion of the ring from the most heavily loaded node. As a consequence of this, the nodes will be out of balance. In other words, when you double the amount nodes you would not have this problem. The best way to rebalance the cluster is to generate new tokens and use the nodetool move new-token command to rebalance the nodes, one at a time. After rebalancing you can run cleanup so the nodes get rid of data they no longer are responsible for. links: http://wiki.apache.org/cassandra/Operations#Range_changes http://wiki.apache.org/cassandra/Operations#Moving_or_Removing_nodes http://www.datastax.com/docs/0.7/operations/clustering#adding-capacity On Apr 12, 2011, at 11:00 AM, Dikang Gu wrote: I have 3 cassandra 0.7.4 nodes in a cluster, and I get the ring stats: [root@yun-phy2 apache-cassandra-0.7.4]# bin/nodetool -h 192.168.1.28 -p 8090 ring Address Status State LoadOwnsToken 109028275973926493413574716008500203721 192.168.1.25Up Normal 157.25 MB 69.92% 57856537434773737201679995572503935972 192.168.1.27Up Normal 201.71 MB 24.28% 99165710459060760249270263771474737125 192.168.1.28Up Normal 68.12 MB5.80% 109028275973926493413574716008500203721 The load and owns vary on each node, is this normal? And is there a way to balance the three nodes? Thanks. -- Dikang Gu 0086 - 18611140205 -- Dikang Gu 0086 - 18611140205 -- Dikang Gu 0086 - 18611140205
cassandra 0.6.3 error Connection refused to host: 127.0.0.1;
Hi All I have migrated my server to centos 5.5.Every thing is up but facing a little issue i have two cassandra nodes. 10.0.0.4 cassandra2 10.0.0.3 cassandra1 I am using open jdk with cassandra,We are faing following error when using nodetool.Only on one server that is cassandra2.Hosts file is also pasted below.I please let me know how can i fix this issue. - sh nodetool -h 10.0.0.3 ring Error connecting to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: --- sh nodetool -h 10.0.0.4 ring Address Status Load Range Ring 129069858893052904163677015069685590304 10.0.0.3 Up 10.02 GB 104465788091875410298027059042850717029|--| 10.0.0.4 Up 9.98 GB 129069858893052904163677015069685590304|--| Hosts file # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1localhost.localdomain localhost 10.0.0.4cassandra2.pringit.com #::1localhost6.localdomain6 localhost6 -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions.
repair never completes with finished successfully
There are a few other threads related to problems with the nodetool repair in 0.7.4. However I'm not seeing any errors, just never getting a message that the repair completed successfully. In my production and test cluster (with just a few MB data) the repair nodetool prompt never returns and the last entry in the cassandra.log is always something like: #TreeRequest manual-repair-f739ca7a-bef8-4683-b249-09105f6719d9, /10.46.108.102, (DFS,main) completed successfully: 1 outstanding But I don't see a message, even hours later, that the 1 outstanding request finished successfully. Anyone else experience this? These are physical server nodes in local data centers and not EC2
Re: Read time get worse during dynamic snitch reset
Something feels odd. From Peters nice write up of the dynamic snitch http://www.mail-archive.com/user@cassandra.apache.org/msg12092.html The RackInferringSnitch (and the PropertyFileSnitch) derive from the AbstractNetworkTopologySnitch and should... In the case of the NetworkTopologyStrategy, it inherits the implementation in AbstractNetworkTopologySnitch which sorts by AbstractNetworkTopologySnitch.compareEndPoints(), which: (1) Always prefers itself to any other node. So myself is always closest, no matter what. (2) Else, always prefers a node in the same rack, to a node in a different rack. (3) Else, always prefers a node in the same dc, to a node in a different dc. AFAIK the (data) request should be going to the local DC even after the DynamicSnitch has reset the scores. Because the underlying RackInferringSnitch should prefer local nodes. Just for fun check rack and dc assignments are what you thought using the operations on o.a.c.db.EndpointSnitchInfo bean in JConsole. Pass in the ip address for the nodes in each dc. If possible can you provide some info on the ip's in each dc? Aaron On 12 Apr 2011, at 18:24, shimi wrote: On Tue, Apr 12, 2011 at 12:26 AM, aaron morton aa...@thelastpickle.com wrote: The reset interval clears the latency tracked for each node so a bad node will be read from again. The scores for each node are then updated every 100ms (default) using the last 100 responses from a node. How long does the bad performance last for? Only a few seconds and but there are a lot of read requests during this time What CL are you reading at ? At Quorum with RF 4 the read request will be sent to 3 nodes, ordered by proximity and wellness according to the dynamic snitch. (for background recent discussion on dynamic snitch http://www.mail-archive.com/user@cassandra.apache.org/msg12089.html) I am reading with CL of ONE, read_repair_chance=0.33, RackInferringSnitch and keys_cached = rows_cached = 0 You can take a look at the weights and timings used by the DynamicSnitch in JConsole under o.a.c.db.DynamicSnitchEndpoint . Also at DEBUG log level you will be able to see which nodes the request is sent to. Everything looks OK. The weights are around 3 for the nodes in the same data center and around 5 for the others. I will turn on the DEBUG level to see if I can find more info. My guess is the DynamicSnitch is doing the right thing and the slow down is a node with a problem getting back into the list of nodes used for your read. It's then moved down the list as it's bad performance is noticed. Looking the DynamicSnitch MBean I don't see any problems with any of the nodes. My guess is that during the reset time there are reads that are sent to the other data center. Hope that helps Aaron Shimi On 12 Apr 2011, at 01:28, shimi wrote: I finally upgraded 0.6.x to 0.7.4. The nodes are running with the new version for several days across 2 data centers. I noticed that the read time in some of the nodes increase by x50-60 every ten minutes. There was no indication in the logs for something that happen at the same time. The only thing that I know that is running every 10 minutes is the dynamic snitch reset. So I changed dynamic_snitch_reset_interval_in_ms to 20 minutes and now I have the problem once in every 20 minutes. I am running all nodes with: replica_placement_strategy: org.apache.cassandra.locator.NetworkTopologyStrategy strategy_options: DC1 : 2 DC2 : 2 replication_factor: 4 (DC1 and DC2 are taken from the ips) Does anyone familiar with this kind of behavior? Shimi
Re: repair never completes with finished successfully
On 12/04/2011 13:31, Jonathan Colby wrote: There are a few other threads related to problems with the nodetool repair in 0.7.4. However I'm not seeing any errors, just never getting a message that the repair completed successfully. In my production and test cluster (with just a few MB data) the repair nodetool prompt never returns and the last entry in the cassandra.log is always something like: #TreeRequest manual-repair-f739ca7a-bef8-4683-b249-09105f6719d9, /10.46.108.102, (DFS,main) completed successfully: 1 outstanding But I don't see a message, even hours later, that the 1 outstanding request finished successfully. Anyone else experience this? These are physical server nodes in local data centers and not EC2 I've seen this. To fix it try a nodetool compact then repair. -- Karl
Re: Strange readRepairChance in server logs
Bug in the CLI, created / fixed https://issues.apache.org/jira/browse/CASSANDRA-2458 use 70 for now. Thanks Aaron On 12 Apr 2011, at 20:46, Héctor Izquierdo Seliva wrote: Hi everyone. I've changed the read repair chance of one of my column families from cassandra-cli with the following entry: update column family cf with read_repair_chance = 0.7 I expected to see in the server log readRepairChance=0.7 Instead I saw this readRepairChance=0.006999, Should I use read_repair_chance = 70 instead of 0.7?
unsubscribe
Re: Questions about the nodetool ring.
If you are seeing a different views of the ring from different nodes you may have some sickness http://www.datastax.com/docs/0.7/troubleshooting/index#view-of-ring-differs-between-some-nodes The ? in the ring output happens when one node does not know if the other is alice or dead. This could be due to the corrupt gossip state described in the link above. During a move the node will decommission and stop taking requests for the rang it was responsible for. But other nodes int he cluster will take it's place. Once it starts bootstrapping it will start accepting writes but not reads. The cluster stays online for all token ranges. Dikang, did you allow the first move to complete before starting the second? Aaron On 12 Apr 2011, at 22:55, Jonathan Colby wrote: when you do a move, the node is decommissioned and bootstrapped. During the autobootstrap process the node will not receive reads until bootstrapping is complete. I assume during the decommission phase the node will also be unavailable, someone correct me if I'm wrong. the ring distribution looks better now. The ? I get all the time too. And if you run ring against different hosts, the question marks probably appear in different places. I'm not sure if it means there is a problem. I haven't taken those question marks too seriously. On Apr 12, 2011, at 11:57 AM, Dikang Gu wrote: After the nodetool move, I got this: [root@server3 apache-cassandra-0.7.4]# bin/nodetool -h 10.18.101.213 ring Address Status State LoadOwnsToken 113427455640312821154458202477256070485 10.18.101.211 ? Normal 82.31 MB33.33% 0 10.18.101.212 ? Normal 84.24 MB33.33% 56713727820156410577229101238628035242 10.18.101.213 Up Normal 54.44 MB33.33% 113427455640312821154458202477256070485 Is this correct? Why is the status ? ? Thanks. On Tue, Apr 12, 2011 at 5:43 PM, Dikang Gu dikan...@gmail.com wrote: The 3 nodes were added to the cluster at the same time, so I'm not sure whey the data vary. I calculate the tokens and get: node 0: 0 node 1: 56713727820156410577229101238628035242 node 2: 113427455640312821154458202477256070485 So I should set these tokens to the three nodes? And during the time I execute the nodetool move commands, can the cassandra servers serve the front end requests at the same time? Is the data safe? Thanks. On Tue, Apr 12, 2011 at 5:15 PM, Jonathan Colby jonathan.co...@gmail.com wrote: This is normal when you just add single nodes. When no token is assigned, the new node takes a portion of the ring from the most heavily loaded node.As a consequence of this, the nodes will be out of balance. In other words, when you double the amount nodes you would not have this problem. The best way to rebalance the cluster is to generate new tokens and use the nodetool move new-token command to rebalance the nodes, one at a time. After rebalancing you can run cleanup so the nodes get rid of data they no longer are responsible for. links: http://wiki.apache.org/cassandra/Operations#Range_changes http://wiki.apache.org/cassandra/Operations#Moving_or_Removing_nodes http://www.datastax.com/docs/0.7/operations/clustering#adding-capacity On Apr 12, 2011, at 11:00 AM, Dikang Gu wrote: I have 3 cassandra 0.7.4 nodes in a cluster, and I get the ring stats: [root@yun-phy2 apache-cassandra-0.7.4]# bin/nodetool -h 192.168.1.28 -p 8090 ring Address Status State LoadOwnsToken 109028275973926493413574716008500203721 192.168.1.25Up Normal 157.25 MB 69.92% 57856537434773737201679995572503935972 192.168.1.27Up Normal 201.71 MB 24.28% 99165710459060760249270263771474737125 192.168.1.28Up Normal 68.12 MB5.80% 109028275973926493413574716008500203721 The load and owns vary on each node, is this normal? And is there a way to balance the three nodes? Thanks. -- Dikang Gu 0086 - 18611140205 -- Dikang Gu 0086 - 18611140205 -- Dikang Gu 0086 - 18611140205
Re: repair never completes with finished successfully
There is no Repair session message either. It just starts with a message like: INFO [manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723] 2011-04-10 14:00:59,051 AntiEntropyService.java (line 770) Waiting for repair requests: [#TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.46.108.101, (DFS,main), #TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.47.108.100, (DFS,main), #TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.47.108.102, (DFS,main), #TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.47.108.101, (DFS,main)] NETSTATS: Mode: Normal Not sending any streams. Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 0 150846 Responses n/a 0 443183 One node in our cluster still has unreadable rows, where the reads trip up every time for certain sstables (you've probably seen my earlier threads regarding that). My suspicion is that the bloom filter read on the node with the corrupt sstables is never reporting back to the repair, thereby causing it to hang. What would be great is a scrub tool that ignores unreadable/unserializable rows! : ) On Apr 12, 2011, at 2:15 PM, aaron morton wrote: Do you see a message starting Repair session and ending with completed successfully ? Or do you see any streaming activity using nodetool netstats Repair can hang if a neighbour dies and fails to send a requested stream. It will timeout after 24 hours (I think). Aaron On 12 Apr 2011, at 23:39, Karl Hiramoto wrote: On 12/04/2011 13:31, Jonathan Colby wrote: There are a few other threads related to problems with the nodetool repair in 0.7.4. However I'm not seeing any errors, just never getting a message that the repair completed successfully. In my production and test cluster (with just a few MB data) the repair nodetool prompt never returns and the last entry in the cassandra.log is always something like: #TreeRequest manual-repair-f739ca7a-bef8-4683-b249-09105f6719d9, /10.46.108.102, (DFS,main) completed successfully: 1 outstanding But I don't see a message, even hours later, that the 1 outstanding request finished successfully. Anyone else experience this? These are physical server nodes in local data centers and not EC2 I've seen this. To fix it try a nodetool compact then repair. -- Karl
Re: Strange readRepairChance in server logs
Thanks Aaron! El mar, 12-04-2011 a las 23:52 +1200, aaron morton escribió: Bug in the CLI, created / fixed https://issues.apache.org/jira/browse/CASSANDRA-2458 use 70 for now. Thanks Aaron On 12 Apr 2011, at 20:46, Héctor Izquierdo Seliva wrote: Hi everyone. I've changed the read repair chance of one of my column families from cassandra-cli with the following entry: update column family cf with read_repair_chance = 0.7 I expected to see in the server log readRepairChance=0.7 Instead I saw this readRepairChance=0.006999, Should I use read_repair_chance = 70 instead of 0.7?
Cassandra monitoring tool
Hi everyone. Looking for ways to monitor cassandra with zabbix I could not found anything that was really usable, till I found mention of a nice class by smeet. I have based my modification upon his work and now I give it back to the community. Here's the project url: http://code.google.com/p/simple-cassandra-monitoring/ It allows to get statistics for any Keyspace/ColumnFamily you want. To start it just build the jar, and launch it using as classpath your cassandra installation lib folder. The first parameter is the node host name. The second parameter is a comma separated list of KS:CF values. For example: java -cp blablabla localhost ks1:cf1,ks1:cf2. Then point curl to http://localhost:9090/ks1/cf1 and some basic stats will be displayed. You can also point to http://localhost:9090/nodeinfo to get some info about the server. If you have any suggestion or improvement you would like to see, please contact me and I will be glad to work on it. Right now it's a bit rough, but it gets the job done. Thanks for your time!
quick repair tool question
does a repair just compare the existing data from sstables on the node being repaired, or will it figure out which data this node should have and copy it in? I'm trying to refresh all the data for a given node (without reassigning the token) starting with an emptied out data directory. I tried nodetool move, but if I give the same token it previously was assigned it doesn't seem to trigger a decommission/bootstrap. Thanks.
Re: quick repair tool question
I think I answered the question myself. The data is streaming in from other replicas even though the node's data dir was emptied out (system dir was left alone). I'm not sure if this is the kosher way to rebuild the sstable data, but it seemed to work. /var/lib/cassandra/data # /opt/cassandra/bin/nodetool -h $HOSTNAME -p 35014 netstats Mode: Normal Not sending any streams. Streaming from: /10.46.108.100 DFS: /var/lib/cassandra/data/DFS/main-f-85-Data.db/(101772144,192460041),(192460041,267088244) progress=0/165316100 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-86-Data.db/(118410757,194489915),(194489915,247653739) progress=0/129242982 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-40-Data.db/(4823893695,4850323665),(4850323665,7818579650) progress=0/2994685955 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-89-Data.db/(0,707948),(707948,2011040) progress=0/2011040 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-70-Data.db/(778069440,1015544852),(1015544852,1200443249) progress=0/422373809 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-71-Data.db/(119366025,132069485),(132069485,156787816) progress=0/37421791 - 0% Streaming from: /10.47.108.100 DFS: /var/lib/cassandra/data/DFS/main-f-365-Data.db/(0,24748050),(126473995,170409694) progress=0/68683749 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-367-Data.db/(0,935041),(935041,2238133) progress=0/2238133 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-366-Data.db/(0,4608808),(37713613,46884920) progress=0/13780115 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-242-Data.db/(0,1057203157),(3307900143,4339490352) progress=0/2088793366 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-352-Data.db/(0,19422069),(81246761,122537002) progress=0/60712310 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-225-Data.db/(0,1580865981),(4540941750,6024843721) progress=0/3064767952 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-349-Data.db/(0,21720053),(54115405,71716716) progress=0/39321364 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-364-Data.db/(0,72606213),(175419693,238159626) progress=0/135346146 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-363-Data.db/(0,1184983783),(3458591846,4556646617) progress=0/2283038554 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-368-Data.db/(0,756228),(756228,1626647) progress=0/1626647 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-361-Data.db/(48074007,78009236) progress=0/29935229 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-226-Data.db/(0,3111952321),(8592898278,11484622800) progress=0/6003676843 - 0% Pool NameActive Pending Completed Commandsn/a 0 5765 Responses n/a 0 9811 On Apr 12, 2011, at 4:59 PM, Jonathan Colby wrote: does a repair just compare the existing data from sstables on the node being repaired, or will it figure out which data this node should have and copy it in? I'm trying to refresh all the data for a given node (without reassigning the token) starting with an emptied out data directory. I tried nodetool move, but if I give the same token it previously was assigned it doesn't seem to trigger a decommission/bootstrap. Thanks.
Cassandra 2 DC deployment
Hi experts, We are planning to deploy Cassandra in 2 datacenters. Let assume there are 3 nodes, RF=3, 2 nodes in 1 DC and 1 node in 2nd DC. Under normal operations, we would read and write at QUORUM. What we want to do though is if we lose a datacenter which has 2 nodes, DC1 in this case, we want to downgrade our consistency to ONE. Basically I am saying that whenever there is a partition, then prefer availability over consistency. In order to do this we plan to catch UnavailableException and take corrective action. So try QUORUM under normal circumstances, if unavailable try ONE. My questions - Do you guys see any flaws with this approach? What happens when DC1 comes back up and we start reading/writing at QUORUM again? Will we read stale data in this case? Thanks -Raj
Re: Cassandra 2 DC deployment
When the down data center comes back up, the Quorum reads will result in a read-repair, so you will get valid data. Besides that, hinted handoff will take care of getting data replicated to a previously down node. You're example is a little unrealistic because you could theoretically have a DC with only one node. So CL.ONE would work every time. But if you have more than 1 node, you have to decide if your application can tolerate getting NULL for a read if the write hasn't propagated from the responsible node to the replica. disclaimer: I'm a cassandra novice. On Apr 12, 2011, at 5:12 PM, Raj N wrote: Hi experts, We are planning to deploy Cassandra in 2 datacenters. Let assume there are 3 nodes, RF=3, 2 nodes in 1 DC and 1 node in 2nd DC. Under normal operations, we would read and write at QUORUM. What we want to do though is if we lose a datacenter which has 2 nodes, DC1 in this case, we want to downgrade our consistency to ONE. Basically I am saying that whenever there is a partition, then prefer availability over consistency. In order to do this we plan to catch UnavailableException and take corrective action. So try QUORUM under normal circumstances, if unavailable try ONE. My questions - Do you guys see any flaws with this approach? What happens when DC1 comes back up and we start reading/writing at QUORUM again? Will we read stale data in this case? Thanks -Raj
Re: cassandra 0.6.3 error Connection refused to host: 127.0.0.1;
Please any one can On 04/12/2011 04:07 PM, Ali Ahsan wrote: Hi All I have migrated my server to centos 5.5.Every thing is up but facing a little issue i have two cassandra nodes. 10.0.0.4 cassandra2 10.0.0.3 cassandra1 I am using open jdk with cassandra,We are faing following error when using nodetool.Only on one server that is cassandra2.Hosts file is also pasted below.I please let me know how can i fix this issue. - sh nodetool -h 10.0.0.3 ring Error connecting to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: --- sh nodetool -h 10.0.0.4 ring Address Status Load Range Ring 129069858893052904163677015069685590304 10.0.0.3 Up 10.02 GB 104465788091875410298027059042850717029|--| 10.0.0.4 Up 9.98 GB 129069858893052904163677015069685590304|--| Hosts file # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1localhost.localdomain localhost 10.0.0.4cassandra2.pringit.com #::1localhost6.localdomain6 localhost6 -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions.
Re: Cassandra monitoring tool
Thanks for sharing this info,I am getting following error,Can please be more specific how can i run this java -cp /home/ali/apache-cassandra-0.6.3/lib/simple-cassandra-monitoring-1.0.jar 127.0.0.1 ks1:cf1,ks1:cf2 Exception in thread main java.lang.NoClassDefFoundError: 127/0/0/1 Caused by: java.lang.ClassNotFoundException: 127.0.0.1 at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: 127.0.0.1. Program will exit. OR java -jar /home/ali/apache-cassandra-0.6.3/lib/simple-cassandra-monitoring-1.0.jar localhost ks1:cf1,ks1:cf2 Failed to load Main-Class manifest attribute from /home/ali/apache-cassandra-0.6.3/lib/simple-cassandra-monitoring-1.0.jar On 04/12/2011 07:26 PM, Héctor Izquierdo Seliva wrote: Hi everyone. Looking for ways to monitor cassandra with zabbix I could not found anything that was really usable, till I found mention of a nice class by smeet. I have based my modification upon his work and now I give it back to the community. Here's the project url: http://code.google.com/p/simple-cassandra-monitoring/ It allows to get statistics for any Keyspace/ColumnFamily you want. To start it just build the jar, and launch it using as classpath your cassandra installation lib folder. The first parameter is the node host name. The second parameter is a comma separated list of KS:CF values. For example: java -cp blablabla localhost ks1:cf1,ks1:cf2. Then point curl to http://localhost:9090/ks1/cf1 and some basic stats will be displayed. You can also point to http://localhost:9090/nodeinfo to get some info about the server. If you have any suggestion or improvement you would like to see, please contact me and I will be glad to work on it. Right now it's a bit rough, but it gets the job done. Thanks for your time! -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions.
pycassa timeouts resolved by killing a random node in the ring
Interesting issue this morning. My apps started throwing a bunch of pycassa timeouts all of a sudden. The ring looked perfect. No load issues anywhere, and no errors in the logs. The site was basically down, so I got desperate and whacked a random node in the ring. As soon as gossip saw it go down, the timeouts went away. Thinking that was kinda crazy, I started the node back up. As soon as it rejoined the ring, pycassa started timing out again. I then killed another random node, far away from the first node I killed, and the timeouts stopped again. Started it back up, and the timeouts started again when it rejoined the ring. Repeated this process once more just to make sure I wasn't insane, and the same result happened. Killing any single node, anywhere in the ring, fixes my timeouts. Actively able to repro this. I am having to just keep one node down right now so the site doesn't break. Desperate for any suggestions or advice on this. Using pycassa 1.0.7. Timeout is set to 15 seconds, with 3 retries. Reads and writes are in quorum. 27 nodes in the ring, with an RF of 3. Thanks, Jason
Re: Timeout during stress test
Here is what cfhistograms look like. Don't really understand what this means, will try to read. I also %util in iostat continuously 90%. Not sure if this is caused by extra reads by cassandra. It seems unusual. [root@dsdb4 ~]# nodetool -h `hostname` cfhistograms StressKeyspace StressStandard StressKeyspace/StressStandard histograms Offset SSTables Write Latency Read Latency Row Size Column Count 1 45720 0 0 0 498857 2 0 0 0 0 0 3 0 0 0 0 0 4 0 0 0 0 0 5 0 0 0 0 0 6 0 0 1 0 0 7 0 0 1 0 0 8 0 0 0 0 0 10 0 0 0 0 0 12 0 0 0 0 0 14 0 0 0 0 0 17 0 1 0 0 0 20 0 2 0 0 0 24 0 1 0 0 0 29 0 6 0 0 0 35 068 0 0 0 42 0 509 0 0 0 50 0 1128 0 0 0 60 0 1449 0 0 0 72 0 789 0 0 0 86 0 400 0 0 0 1030 319 0 0 0 1240 388 0 0 0 1490 456 0 0 0 1790 519 0 0 0 2150 262 0 0 0 2580 194 0 0 0 310048 0 0 0 3720 5 0 0 0 4460 1 0 0 0 5350 0 0 0 0 6420 0 0 0 0 7700 1 0 0 0 9240 1 0 0 0 1109 0 0 0 0 0 1331 0 1 0 0 0 1597 0 0 0 0 1916 1 0 0 0 2299 0 0 0 0 2759 0 0 0 0 3311 0 0 0 0 3973 1 0 0 0 4768 5 0 0 0 572219 0 0 0 686646 0 0 0 8239 102 0 0 0 9887 226 0 0 0 11864 368 0 0 0 14237 572 0
RE: batch_mutate failed: out of sequence response
[I wrote this Apr 10, 2011 at 12:09 but my message seems to have gotten lost along the way.] I use Pelops (the 1.0-0.7.x build from the Github Maven repo) and have occasionally seen this message (under load or during GC). I have a test app running in two separate single-threaded processes doing a slow trickle insert into a single Cassandra 0.7.4 node all on the same box (Mac OS X). This has been running off and on for over a week with no exceptions and I just this same error about two hours ago. Both client processes experienced it at about the same time, and it seemed related to a GC/compaction on the Cassandra instance. I'm guessing that it is either actually a read timeout on the clients, or (less likely) somehow the Cassandra instance mixed up the two responses? On Fri, Apr 8 2011 at 07:28, Dan Washusen d...@reactive.org wrote: Dan Hendry mentioned that he sees these errors. Is he also using Pelops? From his comment about retrying I'd assume not... -- Dan Washusen On Thursday, 7 April 2011 at 7:39 PM, Héctor Izquierdo Seliva wrote: El mié, 06-04-2011 a las 21:04 -0500, Jonathan Ellis escribió: out of sequence response is thrift's way of saying I got a response for request Y when I expected request X. my money is on using a single connection from multiple threads. don't do that. I'm not using thrift directly, and my application is single thread, so I guess this is Pelops fault somehow. Since I managed to tame memory comsuption the problem has not appeared again, but it always happened during a stop-the-world GC. Could it be that the message was sent instead of being dropped by the server when the client assumed it had timed out?
Re: Cassandra monitoring tool
El mar, 12-04-2011 a las 21:24 +0500, Ali Ahsan escribió: Thanks for sharing this info,I am getting following error,Can please be more specific how can i run this java -cp /home/ali/apache-cassandra-0.6.3/lib/simple-cassandra-monitoring-1.0.jar 127.0.0.1 ks1:cf1,ks1:cf2 Exception in thread main java.lang.NoClassDefFoundError: 127/0/0/1 Caused by: java.lang.ClassNotFoundException: 127.0.0.1 at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) Could not find the main class: 127.0.0.1. Program will exit. OR java -jar /home/ali/apache-cassandra-0.6.3/lib/simple-cassandra-monitoring-1.0.jar localhost ks1:cf1,ks1:cf2 Failed to load Main-Class manifest attribute from /home/ali/apache-cassandra-0.6.3/lib/simple-cassandra-monitoring-1.0.jar Hi Ali. You should run it like this java -cp /home/ali/apache-cassandra-0.6.3/lib/* com.google.code.scm.CassandraMonitoring localhost ks1:cf1,ks2:cf2,etc I forgot to mention it has been coded against 0.7.x, and I'm not sure it will work on 0.6.x. I'll try to add support for both 0.6.x and the new 0.8.x version as soon as possible. On 04/12/2011 07:26 PM, Héctor Izquierdo Seliva wrote: Hi everyone. Looking for ways to monitor cassandra with zabbix I could not found anything that was really usable, till I found mention of a nice class by smeet. I have based my modification upon his work and now I give it back to the community. Here's the project url: http://code.google.com/p/simple-cassandra-monitoring/ It allows to get statistics for any Keyspace/ColumnFamily you want. To start it just build the jar, and launch it using as classpath your cassandra installation lib folder. The first parameter is the node host name. The second parameter is a comma separated list of KS:CF values. For example: java -cp blablabla localhost ks1:cf1,ks1:cf2. Then point curl to http://localhost:9090/ks1/cf1 and some basic stats will be displayed. You can also point to http://localhost:9090/nodeinfo to get some info about the server. If you have any suggestion or improvement you would like to see, please contact me and I will be glad to work on it. Right now it's a bit rough, but it gets the job done. Thanks for your time!
forced index creation?
hi, just deployed a new keyspace on 0.7.4 and added the following column family: create column family applications with comparator=UTF8Type and column_metadata=[ {column_name: app_name, validation_class: UTF8Type}, {column_name: app_uri, validation_class: UTF8Type,index_type: KEYS}, {column_name: app_id, validation_class: UTF8Type} ]; I then proceeded to add two new rows of data to it. When i try and query the secondary index on app_uri, my query with phpcassa fails. on the same CF in a different cluster, it works fine. when comparing the CF between clusters, see there's a difference: --- Built indexes: --- shows up when i run -- describe keyspace foobar; Column Metadata: Column Name: app_name (app_name) Validation Class: org.apache.cassandra.db.marshal.UTF8Type Column Name: app_id (app_id) Validation Class: org.apache.cassandra.db.marshal.UTF8Type Column Name: app_uri (app_uri) Validation Class: org.apache.cassandra.db.marshal.UTF8Type Index Type: KEYS Checking out a bit further: get applications where 'app_uri' = 'get-test'; --- RowKey: 9d699733-9afe-4a41-83ca-c60d040dacc0 get applications where 'app_id' = '9d699733-9afe-4a41-83ca-c60d040dacc0'; No indexed columns present in index clause with operator EQ So .. I can see that the secondary indexes are working. Question 1: Has Built indexes been removed from the describe keyspace output? Or have i done something Question 2: Is there a way to force secondary index creation? -- Sasha Dolgy sasha.do...@gmail.com
Re: Cassandra monitoring tool
On 04/12/2011 10:42 PM, Héctor Izquierdo Seliva wrote: I forgot to mention it has been coded against 0.7.x, and I'm not sure it will work on 0.6.x. I'll try to add support for both 0.6.x and the new 0.8.x version as soon as possible. I think these error is because of 0.6.3 ? xception in thread main java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.io.EOFException] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:342) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:267) at com.google.code.scm.CassandraMonitoring.start(CassandraMonitoring.java:58) at com.google.code.scm.CassandraMonitoring.main(CassandraMonitoring.java:190) Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.io.EOFException] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:118) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:203) at javax.naming.InitialContext.lookup(InitialContext.java:409) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1902) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1871) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:276) ... 3 more Caused by: java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.io.EOFException at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:304) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:340) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:114) ... 8 more Caused by: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:246) ... 12 more -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions.
Cassandra node's replication factor two with random partition non Bootstrap node problem
Hi All I have two cassandra node's,If Boot strapped nodes goes down my service remains alive,But if my non Bootstrap (master) node goes down my live site goes down as well,I am using cassandra 0.6.3 can any elaborate on this problem.
Re: Cassandra monitoring tool
I'm not sure. Are you runing it in the same host as the cassandra node? El mar, 12-04-2011 a las 22:54 +0500, Ali Ahsan escribió: On 04/12/2011 10:42 PM, Héctor Izquierdo Seliva wrote: I forgot to mention it has been coded against 0.7.x, and I'm not sure it will work on 0.6.x. I'll try to add support for both 0.6.x and the new 0.8.x version as soon as possible. I think these error is because of 0.6.3 ? xception in thread main java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.io.EOFException] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:342) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:267) at com.google.code.scm.CassandraMonitoring.start(CassandraMonitoring.java:58) at com.google.code.scm.CassandraMonitoring.main(CassandraMonitoring.java:190) Caused by: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.io.EOFException] at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:118) at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:203) at javax.naming.InitialContext.lookup(InitialContext.java:409) at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1902) at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1871) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:276) ... 3 more Caused by: java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.io.EOFException at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:304) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:340) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:114) ... 8 more Caused by: java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:246) ... 12 more
Ec2Snitch + NetworkTopologyStrategy if only in one region?
Hi, I'm getting closer to commiting to cassandra, and now I'm in system/IT issues and questions. I'm in the amazon EC2 cloud. I previously used this forum to discover the best practice for disk layouts (large instance + the two ephemeral disks in RAID0 for data + root volume for everything else). Now I'm hoping to confirm bits and pieces of things I've read about for snitch/replication strategies. I was thinking of using endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' (for people hitting this from the mailing list or google, I feel obligated to note that the former setting is in cassandra.yaml, and the latter is an option on a keyspace). But, I'm only in one region. Is using the amazon snitch/networktopology overkill given everything I have is in one DC (I believe region==DC and availability_zone==rack). I'm using multiple availability zones for some level of redundancy, I'm just not yet to the point I'm using multiple regions. If someday I move to using multiple regions, would that change the answer? Thanks! -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com
Re: Cassandra monitoring tool
Yes same host,I will test this with my developer team and let you know more on it. On 04/12/2011 11:14 PM, Héctor Izquierdo Seliva wrote: I'm not sure. Are you runing it in the same host as the cassandra node? -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions.
Re: Lot of pending tasks for writes
Can someone please help? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Lot-of-pending-tasks-for-writes-tp6263462p6266213.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
flush_largest_memtables_at messages in 7.4
I am using cassandra 7.4 and getting these messages. Heap is 0.7802529021498031 full. You may need to reduce memtable and/or cache sizes Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically How do I verify that I need to adjust any thresholds? And how to calculate correct value? When I got this message only reads were occuring. create keyspace StressKeyspace with replication_factor = 3 and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'; use StressKeyspace; drop column family StressStandard; create column family StressStandard with comparator = UTF8Type and keys_cached = 100 and memtable_flush_after = 1440 and memtable_throughput = 128; nodetool -h dsdb4 tpstats Pool NameActive Pending Completed ReadStage32 281 456598 RequestResponseStage 0 0 797237 MutationStage 0 0 499205 ReadRepairStage 0 0 149077 GossipStage 0 0 217227 AntiEntropyStage 0 0 0 MigrationStage0 0201 MemtablePostFlusher 0 0 1842 StreamStage 0 0 0 FlushWriter 0 0 1841 FILEUTILS-DELETE-POOL 0 0 3670 MiscStage 0 0 0 FlushSorter 0 0 0 InternalResponseStage 0 0 0 HintedHandoff 0 0 15 cfstats Keyspace: StressKeyspace Read Count: 460988 Read Latency: 38.07654727454945 ms. Write Count: 499205 Write Latency: 0.007409593253272703 ms. Pending Tasks: 0 Column Family: StressStandard SSTable count: 9 Space used (live): 247408645485 Space used (total): 247408645485 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 1878 Read Count: 460989 Read Latency: 28.237 ms. Write Count: 499205 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 100 Key cache size: 299862 Key cache hit rate: 0.6031833150384193 Row cache: disabled Compacted row minimum size: 219343 Compacted row maximum size: 5839588 Compacted row mean size: 497474 -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6266221.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Ec2Snitch + NetworkTopologyStrategy if only in one region?
NTS is overkill in the sense that it doesn't really benefit you in a single DC, but if you think you may expand to another DC in the future it's much simpler if you were already using NTS, than first migrating to NTS (changing strategy is painful). I can't think of any downsides to using NTS in a single-DC environment, so that's the safe option. On Tue, Apr 12, 2011 at 1:15 PM, William Oberman ober...@civicscience.com wrote: Hi, I'm getting closer to commiting to cassandra, and now I'm in system/IT issues and questions. I'm in the amazon EC2 cloud. I previously used this forum to discover the best practice for disk layouts (large instance + the two ephemeral disks in RAID0 for data + root volume for everything else). Now I'm hoping to confirm bits and pieces of things I've read about for snitch/replication strategies. I was thinking of using endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' (for people hitting this from the mailing list or google, I feel obligated to note that the former setting is in cassandra.yaml, and the latter is an option on a keyspace). But, I'm only in one region. Is using the amazon snitch/networktopology overkill given everything I have is in one DC (I believe region==DC and availability_zone==rack). I'm using multiple availability zones for some level of redundancy, I'm just not yet to the point I'm using multiple regions. If someday I move to using multiple regions, would that change the answer? Thanks! -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
help
Re: Ec2Snitch + NetworkTopologyStrategy if only in one region?
Excellent to know! (and yes, I figure I'll expand someday, so I'm glad I found this out before digging a hole). The other issue I've been pondering is a normal column family of encoded objects (in my case JSON) vs. a super column. Based on my use case, things I've read, etc... right now I'm coming down on normal + encoded. will On Tue, Apr 12, 2011 at 2:57 PM, Jonathan Ellis jbel...@gmail.com wrote: NTS is overkill in the sense that it doesn't really benefit you in a single DC, but if you think you may expand to another DC in the future it's much simpler if you were already using NTS, than first migrating to NTS (changing strategy is painful). I can't think of any downsides to using NTS in a single-DC environment, so that's the safe option. On Tue, Apr 12, 2011 at 1:15 PM, William Oberman ober...@civicscience.com wrote: Hi, I'm getting closer to commiting to cassandra, and now I'm in system/IT issues and questions. I'm in the amazon EC2 cloud. I previously used this forum to discover the best practice for disk layouts (large instance + the two ephemeral disks in RAID0 for data + root volume for everything else). Now I'm hoping to confirm bits and pieces of things I've read about for snitch/replication strategies. I was thinking of using endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' (for people hitting this from the mailing list or google, I feel obligated to note that the former setting is in cassandra.yaml, and the latter is an option on a keyspace). But, I'm only in one region. Is using the amazon snitch/networktopology overkill given everything I have is in one DC (I believe region==DC and availability_zone==rack). I'm using multiple availability zones for some level of redundancy, I'm just not yet to the point I'm using multiple regions. If someday I move to using multiple regions, would that change the answer? Thanks! -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com
Re: help
http://wiki.apache.org/cassandra/FAQ#unsubscribe http://wiki.apache.org/cassandra/FAQ#unsubscribeIs this what you're looking for? Joaquin Casares DataStax Software Engineer/Support On Tue, Apr 12, 2011 at 2:03 PM, Denis Kirpichenkov den.doki.kirpichen...@gmail.com wrote:
Re: Help on decommission
how long as it been in Leaving status? Is the cluster under stress test load while you are doing the decommission? On Apr 12, 2011, at 6:53 PM, Baskar Duraikannu wrote: I have setup a 4 node cluster for testing. When I setup the cluster, I have setup initial tokens in such a way that each gets 25% of load and then started the node with autobootstrap=false. After all nodes are up, I loaded data using the stress test tool with replication factor of 3. As per of my testing, I am trying to remove one of the node using nodetool decomission but the node seems to be stuck in leaving status. How do I check whether it is doing any work at all? Please help [root@localhost bin]# ./nodetool -h 10.140.22.25 ring Address Status State LoadOwnsToken 127605887595351923798765477786913079296 10.140.22.66Up Leaving 119.41 MB 25.00% 0 10.140.22.42Up Normal 116.23 MB 25.00% 42535295865117307932921825928971026432 10.140.22.28Up Normal 119.93 MB 25.00% 85070591730234615865843651857942052864 10.140.22.25Up Normal 116.21 MB 25.00% 127605887595351923798765477786913079296 [root@localhost bin]# ./nodetool -h 10.140.22.66 netstats Mode: Leaving: streaming data to other nodes Streaming to: /10.140.22.42 /var/lib/cassandra/data/Keyspace1/Standard1-f-1-Data.db/(0,120929157) progress=120929157/120929157 - 100% /var/lib/cassandra/data/Keyspace1/Standard1-f-2-Data.db/(0,3361291) progress=0/3361291 - 0% Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 0 17 Responses n/a 0 108109 [root@usnynyc1cass02 bin]# ./nodetool -h 10.140.22.42 netstats Mode: Normal Not sending any streams. Streaming from: /10.140.22.66 Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-f-2-Data.db/(0,3361291) progress=0/3361291 - 0% Pool NameActive Pending Completed Commandsn/a 0 11 Responses n/a 0 107879 Regards, Baskar
Re: flush_largest_memtables_at messages in 7.4
your jvm heap has reached 78% so cassandra automatically flushes its memtables. you need to explain more about your configuration. 32 or 64 bit OS, what is max heap, how much ram installed? If this happens under stress test conditions its probably understandable. you should look into graphing your memory usage, or use the jconsole to graph heap during your tests. On Apr 12, 2011, at 8:36 PM, mcasandra wrote: I am using cassandra 7.4 and getting these messages. Heap is 0.7802529021498031 full. You may need to reduce memtable and/or cache sizes Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically How do I verify that I need to adjust any thresholds? And how to calculate correct value? When I got this message only reads were occuring. create keyspace StressKeyspace with replication_factor = 3 and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'; use StressKeyspace; drop column family StressStandard; create column family StressStandard with comparator = UTF8Type and keys_cached = 100 and memtable_flush_after = 1440 and memtable_throughput = 128; nodetool -h dsdb4 tpstats Pool NameActive Pending Completed ReadStage32 281 456598 RequestResponseStage 0 0 797237 MutationStage 0 0 499205 ReadRepairStage 0 0 149077 GossipStage 0 0 217227 AntiEntropyStage 0 0 0 MigrationStage0 0201 MemtablePostFlusher 0 0 1842 StreamStage 0 0 0 FlushWriter 0 0 1841 FILEUTILS-DELETE-POOL 0 0 3670 MiscStage 0 0 0 FlushSorter 0 0 0 InternalResponseStage 0 0 0 HintedHandoff 0 0 15 cfstats Keyspace: StressKeyspace Read Count: 460988 Read Latency: 38.07654727454945 ms. Write Count: 499205 Write Latency: 0.007409593253272703 ms. Pending Tasks: 0 Column Family: StressStandard SSTable count: 9 Space used (live): 247408645485 Space used (total): 247408645485 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 1878 Read Count: 460989 Read Latency: 28.237 ms. Write Count: 499205 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 100 Key cache size: 299862 Key cache hit rate: 0.6031833150384193 Row cache: disabled Compacted row minimum size: 219343 Compacted row maximum size: 5839588 Compacted row mean size: 497474 -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6266221.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
json2sstable
Hi, I am trying to run json2sstable with the following command but am receiving the below error. json2sstable -K testks -c testcf output.json /var/lib/cassandra/data/testks/testcf-f-1-Data.db Importing 321 keys... java.lang.NullPointerException at org.apache.cassandra.tools.SSTableImport.addColumnsToCF(SSTableImport.java:136) at org.apache.cassandra.tools.SSTableImport.addToSuperCF(SSTableImport.java:173) at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:228) at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:197) at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:421) ERROR: null anything i did wrongly here? Thanks!
Re: Help on decommission
No. I stopped the stress test before issuing decommission command. So, it was not under ANY load. I waited for over an hour and nothing changed. Then , I turned on DEBUG in the log4j-server.properties and then restarted the Cassandra process . As soon as I restarted, the decommissioned node left the cluster and everything was back to normal. Have you seen this behaviour before? From: Jonathan Colby Sent: Tuesday, April 12, 2011 3:15 PM To: user@cassandra.apache.org Subject: Re: Help on decommission how long as it been in Leaving status? Is the cluster under stress test load while you are doing the decommission? On Apr 12, 2011, at 6:53 PM, Baskar Duraikannu wrote: I have setup a 4 node cluster for testing. When I setup the cluster, I have setup initial tokens in such a way that each gets 25% of load and then started the node with autobootstrap=false. After all nodes are up, I loaded data using the stress test tool with replication factor of 3. As per of my testing, I am trying to remove one of the node using nodetool decomission but the node seems to be stuck in leaving status. How do I check whether it is doing any work at all? Please help [root@localhost bin]# ./nodetool -h 10.140.22.25 ring Address Status State LoadOwnsToken 127605887595351923798765477786913079296 10.140.22.66Up Leaving 119.41 MB 25.00% 0 10.140.22.42Up Normal 116.23 MB 25.00% 42535295865117307932921825928971026432 10.140.22.28Up Normal 119.93 MB 25.00% 85070591730234615865843651857942052864 10.140.22.25Up Normal 116.21 MB 25.00% 127605887595351923798765477786913079296 [root@localhost bin]# ./nodetool -h 10.140.22.66 netstats Mode: Leaving: streaming data to other nodes Streaming to: /10.140.22.42 /var/lib/cassandra/data/Keyspace1/Standard1-f-1-Data.db/(0,120929157) progress=120929157/120929157 - 100% /var/lib/cassandra/data/Keyspace1/Standard1-f-2-Data.db/(0,3361291) progress=0/3361291 - 0% Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 0 17 Responses n/a 0 108109 [root@usnynyc1cass02 bin]# ./nodetool -h 10.140.22.42 netstats Mode: Normal Not sending any streams. Streaming from: /10.140.22.66 Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-f-2-Data.db/(0,3361291) progress=0/3361291 - 0% Pool NameActive Pending Completed Commandsn/a 0 11 Responses n/a 0 107879 Regards, Baskar
Re: flush_largest_memtables_at messages in 7.4
64 bit 12 core 96 GB RAM -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6266400.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Cassandra 2 DC deployment
I think this is reasonable assuming you have enough backhaul to perform reads across DC if read requests hit DC2 (with one copy of data) or one replica from DC1 is down. Moreover, since you clearly stated that you would prefer availability over consistency, you should be prepared for stale reads :) On Tue, Apr 12, 2011 at 8:12 AM, Raj N raj.cassan...@gmail.com wrote: Hi experts, We are planning to deploy Cassandra in 2 datacenters. Let assume there are 3 nodes, RF=3, 2 nodes in 1 DC and 1 node in 2nd DC. Under normal operations, we would read and write at QUORUM. What we want to do though is if we lose a datacenter which has 2 nodes, DC1 in this case, we want to downgrade our consistency to ONE. Basically I am saying that whenever there is a partition, then prefer availability over consistency. In order to do this we plan to catch UnavailableException and take corrective action. So try QUORUM under normal circumstances, if unavailable try ONE. My questions - Do you guys see any flaws with this approach? What happens when DC1 comes back up and we start reading/writing at QUORUM again? Will we read stale data in this case? Thanks -Raj -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Update the Keyspace replication factor online
Hi, What operations will be executed (and what is the associated overhead) when the Keyspace replication factor is changed online, in a multi-datacenter setup with NetworkTopologyStrategy? I checked the wiki and the archive of the mailing list and find this, but it is not very complete. http://wiki.apache.org/cassandra/Operations Replication factor is not really intended to be changed in a live cluster either, but increasing it may be done if you (a) use ConsistencyLevel.QUORUM or ALL (depending on your existing replication factor) to make sure that a replica that actually has the data is consulted, (b) are willing to accept downtime while anti-entropy repair runs (see below), or (c) are willing to live with some clients potentially being told no data exists if they read from the new replica location(s) until repair is done. More specifically, in this scenario: {DC1:1, DC2:1} - {DC2:1, DC3:1} 1. Can this be done online without shutting down the cluster? I thought there is an update keyspace command in the cassandra-cli. 2. If so, what operations will be executed? Will new replicas be created in new locations (in DC3) and existing replicas be deleted in old locations (in DC1)? 3. Or they will be updated only with read with ConssitencyLevel.QUORUM or All, or nodetool repair? Thanks! Yudong
erros which starting cassandra
Hi All, I am getting the following errors when I am trying to start cassandra . Error occurred during initialization of VM Could not reserve enough space for object heap I am using cassandra 0.7.3 uname -a Linux hostname 2.6.18-164.11.1.el5 #1 SMP Wed Jan 20 07:32:21 EST 2010 x86_64 x86_64 x86_64 GNU/Linux Please Suggest Thanks Anurag
Re: Cassandra node's replication factor two with random partition non Bootstrap node problem
I have two cassandra node's,If Boot strapped nodes goes down my service remains alive,But if my non Bootstrap (master) node goes down my live site goes down as well,I am using cassandra 0.6.3 can any elaborate on this problem. Assuming your RF is 2 (not 1), and that you are reading at consistency level ONE (not QUORUM, which would be 2 in the case of RF=2), single-node failures should be tolerated. In order for people to help you'd have to specify some more informaton. For example, your site goes down - but what is the actual error condition w.r.t. Cassandra? What is the error reported by the Cassandra client (and which client is it)? I'm not sure what you mean w.r.t. boostrap/master etc. All nodes should be entirely equal, with the exception of nodes that are marked as seed nodes. But seed nodes going down should not cause reads and writes to fail. -- / Peter Schuller
Re: erros which starting cassandra
I was able to resolve this by changing the heap size Thanks Anurag On Tue, Apr 12, 2011 at 1:38 PM, Anurag Gujral anurag.guj...@gmail.comwrote: Hi All, I am getting the following errors when I am trying to start cassandra . Error occurred during initialization of VM Could not reserve enough space for object heap I am using cassandra 0.7.3 uname -a Linux hostname 2.6.18-164.11.1.el5 #1 SMP Wed Jan 20 07:32:21 EST 2010 x86_64 x86_64 x86_64 GNU/Linux Please Suggest Thanks Anurag
Re: Lot of pending tasks for writes
I am just running simple test in 6 node cassandra 4 GB heap, 96 GB RAM and 12 core per host. I am inserting 1M rows with avg col size of 250k. I keep getting Dropped mutation messages in logs. Not sure how to troubleshoot or tune it. Average col size of 250k - that sounds to me like you're almost certainly going to be bottlenecking on disk I/O. Saturating your active in the mutation stage and building up pending is consistent with simply writing faster than writes can be handled. At first I was skeptical and figured maybe something was wrong, but upon re-reading and spotting your 250k column size - it's really easy to have a stress client saturate nodes with data sizes that large. The first thing I would do is to just look at what's going on on the system. For example, just run iostat -x -k 1 on the machines and see whether you're completely disk bound or not. I suspect you are, and that the effects you're seeing is simply the result of that. However that would depend on how many mutations per second you're actually sending. But if you're using out-of-the-box stress.py without rate limiting and using a column size of 250k, I am not at all surprised that you're easily able to saturate your nodes. -- / Peter Schuller
Re: flush_largest_memtables_at messages in 7.4
Heap is 0.7802529021498031 full. You may need to reduce memtable and/or cache sizes Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically How do I verify that I need to adjust any thresholds? And how to calculate correct value? Is this on the same cluster/nodes that you're doing your 250k column stresses (the other thread)? In any case, for typical cases there is: http://www.datastax.com/docs/0.7/operations/tuning -- / Peter Schuller
Re: Cassandra 2 DC deployment
When the down data center comes back up, the Quorum reads will result in a read-repair, so you will get valid data. Besides that, hinted handoff will take care of getting data replicated to a previously down node. *Eventually* though, but yes. I.e., there would be no expectation to instantly go back to full consistency once it goes back up. Also, I would argue that it's useful to consider this: If you're implementing automatic fallback to ONE whenever QUORUM fails; consider all cases where this might happen for reasons *other* than there being a legitimate partition of the DC:s. For example, some random networking issues causing fewer nodes to be up etc. A valid question is: If you simply do automatic fallback whenever QUORUM fails anyway, are you significantly increasing consistency with respect to ONE anyway? In some cases yes, but just be sure you know what you're doing... Keep in mind that when all nodes are up and all is working well, CL.ONE doesn't mean that writes won't be replicated to all nodes. It just means that only one is *required* - and same for reads. If you have some situation whereby you normally want the strict requirement that a read subsequent to a write sees the written data, that doesn't sound very compatible with automatically falling back to CL.ONE... Anyways, those are my off-the-cuff thoughts - maybe it doesn't apply in the situation in question. -- / Peter Schuller
Re: erros which starting cassandra
I was able to resolve this by changing the heap size And that is the preferred solution. While adjusting stuff like the kernel overcommit settings might allow the JVM to start, there is no reason ever to have a heap size larger than what physical memory on the server can actually sustain. So decreasing heap size is the appropriate course of action. -- / Peter Schuller
Re: flush_largest_memtables_at messages in 7.4
Yes -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6266726.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Lot of pending tasks for writes
It does appear that I am IO bound. Disks show about 90% util. Well, also pay attention to the average queue size column. If there are constantly more requests waiting to be serviced than you have platters, you're almost certainly I/O bound. The utilization number can be a bit flaky sometimes, although 90% doesn't a bit too far below 100% to be attributed to inexactness in the kernel's measurements. What are my options then? Is cassandra not suitable for columns of this size? It depends. Cassandra is a log-structured database, meaning that all writes are sequential and you are going to be doing background compactions that imply re-reading and re-writing data. This optimization makes sense in particular for smaller values where the cost of doing sequential I/O is a lot less than seek-bound I/O, but it is less relevant for large values. The main cost of background compactions is the extra reading and writing of data that happens. If your workload is full of huge values, then the only significant cost *is* the sequential I/O. So in that sense, background compaction becomes more expensive relative to the theoretical optimum than it does for small values. It depends on details of the access pattern, but I'd say that (1) for very large values, Cassandra's advantages become less pronounced in terms of local storage on each nodes, although the clustering capabilities remain relevant, and that (2) depending on the details of the use-case, Cassandra *may* not be terribly suitable. I am running stress code from hector which doesn't sound like give ability to do operations per sec. I am insert 1M rows and then reading. Have not been able to do in parallel because of io issues. stress.py doesn't support any throttling, except very very indirectly by limiting the total number of threads. In a situation like this I think you need to look at what your target traffic is going to be like. Throwing un-throttled traffic at the cluster like stress.py does is not indicative of normal traffic patterns. For typical use-cases with small columns this is still handled well, but when you are both unthrottled *and* are throwing huge columns at it, there is no expectation that this is handled very well. So, for large values like this I recommend figuring out what the actual expected sustained amount of writes is, and then benchmark that. Using stress.py out-of-the-box is not giving you much relevant information, other than the known fact that throwing huge-column traffic at Cassandra without throttling is not handled very gracefully. But that said, when using un-throttled benchmarking like stress.py - at any time where you're throwing more traffic at the cluster than it can handle, is it *fully expected* that you will see the 'active' stages be saturated and a build-up of 'pending' operations. This is the expected results of submitting a greater number of requests per second than can be processed - in pretty much any system. You queue up to some degree, and eventually you start having to drop or fail requests. The unique thing about large columns is that it becomes a lot easier to saturate a node with a single (or few) stress.py clients than it is when stressing with a more normal type of load. The extra cost of dealing with large values is higher in Cassandra than it is in stress.py; so suddenly a single stress.py can easily saturate lots of nodes simply because you can so trivially be writing data at very high throughput by upping the column sizes -- / Peter Schuller
Re: flush_largest_memtables_at messages in 7.4
Yes Without checking I don't know the details of the memtable threshold calculations enough to be sure whether large columns are somehow causing the size estimations to be ineffective (off hand I would expect the reverse since the overhead of the Java object structures become much less significant); but if this is not the case, then this particular problem should be a matter of adjusting heap size according to your memtable thresholds. I.e., increase heap size and/or decrease memtable flush thresholds. -- / Peter Schuller
Re: CLI does not list data after upgrading to 0.7.4
I'm running into the same issue with 0.7.4. You don't need to specify lexicaluuid, seems any valid key type will work- it just needs to fit with your data (ascii, bytes, etc). On Sun, Apr 10, 2011 at 7:13 PM, Patrick Julien pjul...@gmail.com wrote: put in an assumption first, so from cassandra-cli, do: assume aCF KEYS as lexicaluuid; then do your list On Sun, Apr 10, 2011 at 10:03 PM, Wenjun Che wen...@openf.in wrote: It is happening on clean 0.7.4 server as well. Here is how to reproduce: 1. create a CF with UUID as row key 2. add some data 3. list CF always returns Input length = 1 I figured out one way to fix this: run 'assume CF keys as lexicaluuid;. This issue does not happen to CLI of 0.7.0 or earlier, even running against 0.7.4 server. On Sat, Apr 9, 2011 at 5:53 PM, aaron morton aa...@thelastpickle.com wrote: Just tested the 0.7.4 cli against an clean 0.7.4 server and list worked. If I restart the server while the cli is connected i get... [default@dev] list data; Using default limit of 100 null Aaron On 8 Apr 2011, at 17:23, Wenjun Che wrote: Hello I just upgraded a 1-node setup from rc2 to 0.7.4 and ran scrub without any error. Now 'list CF' in CLI does not return any data as followings: list User; Using default limit of 100 Input length = 1 I don't see any errors or exceptions in the log. If I run CLi from 0.7.0 against 0.7.4 server, I am getting data. Thanks -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: Cassandra Database Modeling
Yes for interactive == real time queries. Hadoop based techniques are non time critical queries, but they do have greater analytical capabilities. particle_pairs: 1) Yes and no and sort of. Under the hood the get_slice api call will be used by your client library to pull back chunks of (ordered) columns. Most client libraries abstract away the chunking for you. 2) If you are using a packed structure like JSON then no, Cassandra will have no idea what you've put in the columns other than bytes . It really depends on how much data you have per pair, but generally it's easier to pull back more data than try to get exactly what you need. Downside is you have to update all the data. 3) No, you would need to update all the data for the pair. I was assuming most of the data was written once, and that your simulation had something like a stop-the-world phase between time slices where state was dumped and then read to start the next interval. You could either read it first, or we can come up with something else. distance_cf 1) the query would return an list of columns, which have a name and value (as well as a timestamp and ttl). 2) depends on the client library, if using python go for https://github.com/pycassa/pycassa It will return objects 3) returning millions of columns is going to be slow, would also be slow using a RDBMS. Creating millions objects in python is going to be slow. You would need to have a better idea of what queries you will actually want to run to see if it's *too* slow. If it is one approach is to store the particles at the same distance in the same column, so you need to read less columns. Again depends on how your sim works. Time complexity depends on the number of columns read. Finding a row will not be O(1) as it it may have to read from several files. Writes are more constant than reads. But remember, you can have a lot of io and cpu power in your cluster. Best advice is to jump in and see if the data model works for you at a small single node scale, most performance issues can be solved. Aaron On 12 Apr 2011, at 15:34, csharpplusproject wrote: Hi Aaron, Yes, of course it helps, I am starting to get a flavor of Cassandra -- thank you very much! First of all, by 'interactive' queries, are you referring to 'real-time' queries? (meaning, where experiments data is 'streaming', data needs to be stored and following that, the query needs to be run in real time)? Looking at the design of the particle pairs: - key: expriement_id.time_interval - column name: pair_id - column value: distance, angle, other data packed together as JSON or some other format A couple of questions: (1) Will a query such as pairID[ expriement_id.time_interval ] will basically return an array of all paidIDs for the experiment, where each item is a 'packed' JSON? (2) Would it be possible, rather than returning the whole JSON object per every pairID, to get (say) only the distance? (3) Would it be possible to easily update certain 'pairIDs' with new values (for example, update pairIDs = {2389, 93434} with new distance values)? Looking at the design of the distance CF (for example): this is VERY INTERESTING. basically you are suggesting a design that will save the actual distance between each pair of particles, and will allow queries where we can find all pairIDs (for an experiment, on time_interval) that meet a certain distance criteria. VERY, VERY INTERESTING! A couple of questions: (1) Will a query such as distanceCF[ expriement_id.time_interval ] will basically return an array of all 'zero_padded_distance.pair_id' elements for the experiment? (2) In such a case, I will get (presumably) a python list where every item is a string (and I will need to process it)? (3) Given the fact that we're doing a slice on millions of columns (?), any idea how fast such an operation would be? Just to make sure I understand, is it true that in both situations, the query complexity is basically O(1) since it's simply a HASH? Thank you for all of your help! Shalom. -Original Message- From: aaron morton aa...@thelastpickle.com Reply-to: user@cassandra.apache.org To: user@cassandra.apache.org Subject: Re: Cassandra Database Modeling Date: Tue, 12 Apr 2011 10:43:42 +1200 The tricky part here is the level of flexibility you want for the querying. In general you will want to denormalise to support the read queries. If your queries are not interactive you may be able to use Hadoop / Pig / Hive e.g. http://www.datastax.com/products/brisk In which case you can probably have a simpler data model where you spend less effort supporting the queries. But it sounds like you need interactive queries as part of the experiment. You could store the data per pair in a standard CF (lets call it the pair cf) as follows: - key: expriement_id.time_interval - column name: pair_id - column
Re: CL.ONE reads / RR / badness_threshold interaction
To now answer my own question, the critical points that are different from what I said earlier are: that CL.ONE does prefer *one* node (which one depending on snitch) and that RR uses digests (which are not mentioned on the wiki page [1]) instead of comparing raw requests. I updated it to mention digest queries with a link to another page to explain what that is, and why they are used. I am assuming that RR digests save on bandwidth, but to generate the digest with a row cache miss the same number of disk seeks are required (my nemesis is disk io). Yes. It's only a bandwidth optimization. So to increase pinny-ness I'll further reduce RR chance and set a badness threshold. Thanks all. Just be aware that, assuming I am not missing something, while this will indeed give you better cache locality under normal circumstances - once that closest node does go down, traffic will then go to a node which will have potentially zero cache hit rate on that data since all reads up to that point were taken by the node that just went down. So it's not an obvious win depending. -- / Peter Schuller
Re: CL.ONE reads / RR / badness_threshold interaction
On 04/12/2011 06:27 PM, Peter Schuller wrote: So to increase pinny-ness I'll further reduce RR chance and set a badness threshold. Thanks all. Just be aware that, assuming I am not missing something, while this will indeed give you better cache locality under normal circumstances - once that closest node does go down, traffic will then go to a node which will have potentially zero cache hit rate on that data since all reads up to that point were taken by the node that just went down. So it's not an obvious win depending. Yeah there less than great behaviour when nodes are restarted or otherwise go down with this configuration. Probably still preferable for my current situation. Other's mileage may vary. http://img27.imageshack.us/img27/85/cacherestart.png
Re: quick repair tool question
On 04/12/2011 11:11 AM, Jonathan Colby wrote: I'm not sure if this is the kosher way to rebuild the sstable data, but it seemed to work. http://wiki.apache.org/cassandra/Operations#Handling_failure Option #3.
Re: flush_largest_memtables_at messages in 7.4
One thing I am noticing is that cache hit rate is very low even though my cache key size is 1M and I have less than 1M rows. Not sure why so many cache miss? Keyspace: StressKeyspace Read Count: 162506 Read Latency: 45.22479006928975 ms. Write Count: 247180 Write Latency: 0.011610943442026053 ms. Pending Tasks: 0 Column Family: StressStandard SSTable count: 184 Space used (live): 99616537894 Space used (total): 99616537894 Memtable Columns Count: 351 Memtable Data Size: 171716049 Memtable Switch Count: 543 Read Count: 162507 Read Latency: 317.892 ms. Write Count: 247180 Write Latency: 0.006 ms. Pending Tasks: 0 Key cache capacity: 100 Key cache size: 256013 Key cache hit rate: 0.33801452784503633 Row cache: disabled Compacted row minimum size: 182786 Compacted row maximum size: 5839588 Compacted row mean size: 537470 -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6267234.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Remove call vs. delete mutation
Is there anybody else that might see a problem with just using delete mutations instead of remove calls? I'm thinking about changing a Cassandra client to always use delete mutations when removing objects, that way the delete/remove call interface can be kept the same: 1- the delete/remove client call would always support all features: single-key/column, multi-column and slice range deletes. 2- it could be used in the same way regardless of embedding the calls into batch mutations or removing a single column/key I'd like to hear some more thoughts about this change not causing the Cassandra server to take a much higher CPU toll just because decoding mutations is much less optimized than straight removes or something like that...(I don't think so but...). In other words, if I do 1000 inserts or 1000 single-delete mutations, would the Cassandra server see much of a difference? Cheers, Josep M. On Mon, Apr 11, 2011 at 3:49 PM, aaron morton aa...@thelastpickle.com wrote: AFAIK both follow the same path internally. Aaron On 12 Apr 2011, at 06:47, Josep Blanquer wrote: All, From a thrift client perspective using Cassandra, there are currently 2 options for deleting keys/columns/subcolumns: 1- One can use the remove call: which only takes a column path so you can only delete 'one thing' at a time (an entire key, an entire supercolumn, a column or a subcolumn) 2- A delete mutation: which is more flexible as it allows to delete a list of columns an even a slice range of them within a single call. The question I have is: is there a noticeable difference in performance between issuing a remove call, or a mutation with a single delete? In other words, why would I use the remove call if it's much less flexible than the mutation? ...or another way to put it: is the remove call just there for backwards compatibility and will be superseded by the delete mutations in the future? Cheers, Josep M.
Exception on cassandra startup 0.7.4
Hello, I've been running a single node cluster (0.7.4 built from the SVN tag, running on JDK 1.6.0_21 on Ubuntu 10.10) for testing purposes. After running fine for a couple of weeks, I got the error below on startup. It sounded like the error which is supposed to be fixed by the nodetool scrub command, but since I can't run the scrub command without starting up the instance, and the instance won't start, this wasn't any use. Also, I'm fairly certain that the keyspaces in this node have only been written by 0.7.4 code. Since it was just a test node, I just blew away the data directory. Had I been thinking, I would have saved it off so I could duplicate the issue. If I can provide any other information, please let me know. Thank you, Paul paul@host:~/apps/cassandra-svn/bin$ ./cassandra -f INFO 20:57:44,344 Logging initialized INFO 20:57:44,357 Heap size: 3051814912/3052863488 INFO 20:57:44,358 JNA not found. Native methods will be disabled. INFO 20:57:44,365 Loading settings from file:/home/paul/apps/cassandra-svn/conf/cassandra.yaml INFO 20:57:44,474 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 20:57:44,593 Opening /home/paul/apps/cassandra/node1/data/system/Schema-f-378 INFO 20:57:44,606 Opening /home/paul/apps/cassandra/node1/data/system/Schema-f-379 INFO 20:57:44,608 Opening /home/paul/apps/cassandra/node1/data/system/Schema-f-377 INFO 20:57:44,618 Opening /home/paul/apps/cassandra/node1/data/system/Migrations-f-377 INFO 20:57:44,620 Opening /home/paul/apps/cassandra/node1/data/system/Migrations-f-378 INFO 20:57:44,622 Opening /home/paul/apps/cassandra/node1/data/system/Migrations-f-379 INFO 20:57:44,627 Opening /home/paul/apps/cassandra/node1/data/system/LocationInfo-f-29 INFO 20:57:44,629 Opening /home/paul/apps/cassandra/node1/data/system/LocationInfo-f-30 INFO 20:57:44,631 Opening /home/paul/apps/cassandra/node1/data/system/LocationInfo-f-31 INFO 20:57:44,674 Loading schema version debf273e-631f-11e0-ac72-e700f669bcfc INFO 20:57:44,883 Opening /home/paul/apps/cassandra/node1/data/DaisyWorksKS/User-f-1 INFO 20:57:44,886 Opening /home/paul/apps/cassandra/node1/data/DaisyWorksKS/User-f-2 INFO 20:57:44,892 Opening /home/paul/apps/cassandra/node1/data/DaisyWorksTest/User-f-11 INFO 20:57:44,895 Opening /home/paul/apps/cassandra/node1/data/DaisyWorksTest/Product-f-10 INFO 20:57:44,908 Creating new commitlog segment /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302569864908.log INFO 20:57:44,916 Replaying /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302379611027.log, /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302567818267.log, /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302567841352.log, /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302567871659.log, /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302568152030.log, /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302569289258.log INFO 20:57:44,937 Finished reading /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302379611027.log ERROR 20:57:44,937 Exception encountered during startup. java.io.IOError: java.io.EOFException at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:246) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:262) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:223) at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493) at java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:363) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:311) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129) at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:120) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:253) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:156) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:173) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:320) at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:289) at
Re: repair never completes with finished successfully
Ah, unreadable rows and in the validation compaction no less. Makes a little more sense now. Anyone help with the EOF when deserializing columns ? Is the fix to run scrub or drop the sstable ? Here's a a theory, AES is trying to... 1) Create TreeRequest 's that specify a range we want to validate. 2) Send TreeRequest 's to local node and neighbour 3) Process TreeRequest by running a validation compaction (CompactionManager.doValidationCompaction in your prev stacks) 4) When both TreeRequests return back work out the differences and then stream data if needed. Perhaps step 3 is not completing because of errors like http://www.mail-archive.com/user@cassandra.apache.org/msg12196.html If the row is over multiple sstables we can skip the row in one sstable. However if it's in a single sstable PrecompactedRow will raise an IOError if there is a problem. This is not what is in the linked error stack that shows a row been skipped, just a hunch we could checkout. Do you see an IOErrors (not exceptions) in the logs or exceptions with doValidationCompaction in the stack? For a tree request on the node you start compaction on you should see these logs... 1) Waiting for repair requests... 2) One of Stored local tree or Stored remote tree depending on which returns first at DEBUG level 3) Queuing comparison If we do not have the 3rd log then we did not get a replay from either local or remote. Aaron On 13 Apr 2011, at 00:57, Jonathan Colby wrote: There is no Repair session message either. It just starts with a message like: INFO [manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723] 2011-04-10 14:00:59,051 AntiEntropyService.java (line 770) Waiting for repair requests: [#TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.46.108.101, (DFS,main), #TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.47.108.100, (DFS,main), #TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.47.108.102, (DFS,main), #TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.47.108.101, (DFS,main)] NETSTATS: Mode: Normal Not sending any streams. Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 0 150846 Responses n/a 0 443183 One node in our cluster still has unreadable rows, where the reads trip up every time for certain sstables (you've probably seen my earlier threads regarding that). My suspicion is that the bloom filter read on the node with the corrupt sstables is never reporting back to the repair, thereby causing it to hang. What would be great is a scrub tool that ignores unreadable/unserializable rows! : ) On Apr 12, 2011, at 2:15 PM, aaron morton wrote: Do you see a message starting Repair session and ending with completed successfully ? Or do you see any streaming activity using nodetool netstats Repair can hang if a neighbour dies and fails to send a requested stream. It will timeout after 24 hours (I think). Aaron On 12 Apr 2011, at 23:39, Karl Hiramoto wrote: On 12/04/2011 13:31, Jonathan Colby wrote: There are a few other threads related to problems with the nodetool repair in 0.7.4. However I'm not seeing any errors, just never getting a message that the repair completed successfully. In my production and test cluster (with just a few MB data) the repair nodetool prompt never returns and the last entry in the cassandra.log is always something like: #TreeRequest manual-repair-f739ca7a-bef8-4683-b249-09105f6719d9, /10.46.108.102, (DFS,main) completed successfully: 1 outstanding But I don't see a message, even hours later, that the 1 outstanding request finished successfully. Anyone else experience this? These are physical server nodes in local data centers and not EC2 I've seen this. To fix it try a nodetool compact then repair. -- Karl
Re: cassandra 0.6.3 error Connection refused to host: 127.0.0.1;
Can you connect from the local machine using 127.0.0.1 ? Are you running any sort of fire wall? Check you can connect from the node to the JMX port (8080 by default) using telnet Aaron On 13 Apr 2011, at 04:25, Ali Ahsan wrote: Any one can guide me on this issue ? On 04/12/2011 04:07 PM, Ali Ahsan wrote: Hi All I have migrated my server to centos 5.5.Every thing is up but facing a little issue i have two cassandra nodes. 10.0.0.4 cassandra2 10.0.0.3 cassandra1 I am using open jdk with cassandra,We are faing following error when using nodetool.Only on one server that is cassandra2.Hosts file is also pasted below.I please let me know how can i fix this issue. - sh nodetool -h 10.0.0.3 ring Error connecting to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: --- sh nodetool -h 10.0.0.4 ring Address Status Load Range Ring 129069858893052904163677015069685590304 10.0.0.3 Up 10.02 GB 104465788091875410298027059042850717029|--| 10.0.0.4 Up 9.98 GB 129069858893052904163677015069685590304|--| Hosts file # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1localhost.localdomain localhost 10.0.0.4cassandra2.pringit.com #::1localhost6.localdomain6 localhost6 -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahore, P.O. Box 676 Lahore 54000, Pakistan Tel: +92 (0)42 3758 7140 Ext. 128 Mobile: +92 (0)345 831 8769 Fax: +92 (0)42 3758 0027 Email: ali.ah...@panasiangroup.com www.ebusiness-pg.com www.panasiangroup.com Confidentiality: This e-mail and any attachments may be confidential and/or privileged. If you are not a named recipient, please notify the sender immediately and do not disclose the contents to another person use it for any purpose or store or copy the information in any medium. Internet communications cannot be guaranteed to be timely, secure, error or virus-free. We do not accept liability for any errors or omissions.
Re: forced index creation?
Built indexes are there for me [default@unknown] describe keyspace Keyspace1; Keyspace: Keyspace1: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Replication Factor: 1 Column Families: ColumnFamily: Indexed1 default_validation_class: org.apache.cassandra.db.marshal.LongType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.145312498/31/1440 (millions of ops/minutes/MB) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Built indexes: [Indexed1.birthdate_idx] Column Metadata: Column Name: birthdate Validation Class: org.apache.cassandra.db.marshal.LongType Index Name: birthdate_idx Index Type: KEYS When the index is created existing data is indexed async, and any new data is indexed as part of the write. Not sure how to force/check things though. Can you turn logging up to DEBUG and compare the requests between the two clusters ? Aaron On 13 Apr 2011, at 05:46, Sasha Dolgy wrote: hi, just deployed a new keyspace on 0.7.4 and added the following column family: create column family applications with comparator=UTF8Type and column_metadata=[ {column_name: app_name, validation_class: UTF8Type}, {column_name: app_uri, validation_class: UTF8Type,index_type: KEYS}, {column_name: app_id, validation_class: UTF8Type} ]; I then proceeded to add two new rows of data to it. When i try and query the secondary index on app_uri, my query with phpcassa fails. on the same CF in a different cluster, it works fine. when comparing the CF between clusters, see there's a difference: --- Built indexes: --- shows up when i run -- describe keyspace foobar; Column Metadata: Column Name: app_name (app_name) Validation Class: org.apache.cassandra.db.marshal.UTF8Type Column Name: app_id (app_id) Validation Class: org.apache.cassandra.db.marshal.UTF8Type Column Name: app_uri (app_uri) Validation Class: org.apache.cassandra.db.marshal.UTF8Type Index Type: KEYS Checking out a bit further: get applications where 'app_uri' = 'get-test'; --- RowKey: 9d699733-9afe-4a41-83ca-c60d040dacc0 get applications where 'app_id' = '9d699733-9afe-4a41-83ca-c60d040dacc0'; No indexed columns present in index clause with operator EQ So .. I can see that the secondary indexes are working. Question 1: Has Built indexes been removed from the describe keyspace output? Or have i done something Question 2: Is there a way to force secondary index creation? -- Sasha Dolgy sasha.do...@gmail.com
Re: Ec2Snitch + NetworkTopologyStrategy if only in one region?
If you can use standard + encoded I would go with that. Aaron On 13 Apr 2011, at 07:07, William Oberman wrote: Excellent to know! (and yes, I figure I'll expand someday, so I'm glad I found this out before digging a hole). The other issue I've been pondering is a normal column family of encoded objects (in my case JSON) vs. a super column. Based on my use case, things I've read, etc... right now I'm coming down on normal + encoded. will On Tue, Apr 12, 2011 at 2:57 PM, Jonathan Ellis jbel...@gmail.com wrote: NTS is overkill in the sense that it doesn't really benefit you in a single DC, but if you think you may expand to another DC in the future it's much simpler if you were already using NTS, than first migrating to NTS (changing strategy is painful). I can't think of any downsides to using NTS in a single-DC environment, so that's the safe option. On Tue, Apr 12, 2011 at 1:15 PM, William Oberman ober...@civicscience.com wrote: Hi, I'm getting closer to commiting to cassandra, and now I'm in system/IT issues and questions. I'm in the amazon EC2 cloud. I previously used this forum to discover the best practice for disk layouts (large instance + the two ephemeral disks in RAID0 for data + root volume for everything else). Now I'm hoping to confirm bits and pieces of things I've read about for snitch/replication strategies. I was thinking of using endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' (for people hitting this from the mailing list or google, I feel obligated to note that the former setting is in cassandra.yaml, and the latter is an option on a keyspace). But, I'm only in one region. Is using the amazon snitch/networktopology overkill given everything I have is in one DC (I believe region==DC and availability_zone==rack). I'm using multiple availability zones for some level of redundancy, I'm just not yet to the point I'm using multiple regions. If someday I move to using multiple regions, would that change the answer? Thanks! -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com
Re: json2sstable
Reading the code looks like it could not find a subColumns item for the row in the json file. The target CF is a super CF, is the data from a super CF ? Aaron On 13 Apr 2011, at 07:24, Steven Teo wrote: Hi, I am trying to run json2sstable with the following command but am receiving the below error. json2sstable -K testks -c testcf output.json /var/lib/cassandra/data/testks/testcf-f-1-Data.db Importing 321 keys... java.lang.NullPointerException at org.apache.cassandra.tools.SSTableImport.addColumnsToCF(SSTableImport.java:136) at org.apache.cassandra.tools.SSTableImport.addToSuperCF(SSTableImport.java:173) at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:228) at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:197) at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:421) ERROR: null anything i did wrongly here? Thanks!
Re: Update the Keyspace replication factor online
Are you changing the replication factor or moving nodes ? To change the RF you need to repair and then once all repairing is done run cleanup to remove the hold data. You can move whole nodes by moving all their data with them, assigning a new ip, and updating the topology file if used. Aaron On 13 Apr 2011, at 07:56, Yudong Gao wrote: Hi, What operations will be executed (and what is the associated overhead) when the Keyspace replication factor is changed online, in a multi-datacenter setup with NetworkTopologyStrategy? I checked the wiki and the archive of the mailing list and find this, but it is not very complete. http://wiki.apache.org/cassandra/Operations Replication factor is not really intended to be changed in a live cluster either, but increasing it may be done if you (a) use ConsistencyLevel.QUORUM or ALL (depending on your existing replication factor) to make sure that a replica that actually has the data is consulted, (b) are willing to accept downtime while anti-entropy repair runs (see below), or (c) are willing to live with some clients potentially being told no data exists if they read from the new replica location(s) until repair is done. More specifically, in this scenario: {DC1:1, DC2:1} - {DC2:1, DC3:1} 1. Can this be done online without shutting down the cluster? I thought there is an update keyspace command in the cassandra-cli. 2. If so, what operations will be executed? Will new replicas be created in new locations (in DC3) and existing replicas be deleted in old locations (in DC1)? 3. Or they will be updated only with read with ConssitencyLevel.QUORUM or All, or nodetool repair? Thanks! Yudong
Re: flush_largest_memtables_at messages in 7.4
One thing I am noticing is that cache hit rate is very low even though my cache key size is 1M and I have less than 1M rows. Not sure why so many cache miss? The key cache should be strictly LRU for read-only workloads. For write/read workloads it may not be strictly LRU because compaction causes key cache migration. In your case: Key cache capacity: 100 Key cache size: 256013 Key cache hit rate: 0.33801452784503633 So you have only 256k in the cache. Have you run for long enough after enabling it for it to actually be fully populated? -- / Peter Schuller
Re: Cassandra Database Modeling
Aaron, Thank you so much for your help. It is greatly appreciated! Looking at the design of the particle pairs: - key: expriement_id.time_interval - column name: pair_id - column value: distance, angle, other data packed together as JSON or some other format You wrote that retrieving millions of columns (I will have about 10,000,000 particles pairs) would be slow. You are also right that the retrieval of millions of columns into Python, won't be fast. If my desired query is to get all particle pairs on time interval [ Tn..T(n+1) ] where the distance between the two particles is smaller than X and the angle between the two particles is greater than Y. In such a query (as the above), given the fact that retrieving millions of columns could be slow, would it be best to say 'concatenate' all values for all particle pairs for a given 'expriement_id.time_interval' into one column? If data is stored in this way, I will be getting from Cassandra a binary string / JSON Object that I will have to 'unpack' in my application. Is this a recommended approach? are there better approaches? Is there a limit to the size that can be stored in one 'cell' (by 'cell' I mean the intersection between a key and a data column)? is there a limit to the size of data of one key? one data column? Thanks in advance for any help / guidance. -Original Message- From: aaron morton aa...@thelastpickle.com Reply-to: user@cassandra.apache.org To: user@cassandra.apache.org Subject: Re: Cassandra Database Modeling Date: Wed, 13 Apr 2011 10:14:21 +1200 Yes for interactive == real time queries. Hadoop based techniques are non time critical queries, but they do have greater analytical capabilities. particle_pairs: 1) Yes and no and sort of. Under the hood the get_slice api call will be used by your client library to pull back chunks of (ordered) columns. Most client libraries abstract away the chunking for you. 2) If you are using a packed structure like JSON then no, Cassandra will have no idea what you've put in the columns other than bytes . It really depends on how much data you have per pair, but generally it's easier to pull back more data than try to get exactly what you need. Downside is you have to update all the data. 3) No, you would need to update all the data for the pair. I was assuming most of the data was written once, and that your simulation had something like a stop-the-world phase between time slices where state was dumped and then read to start the next interval. You could either read it first, or we can come up with something else. distance_cf 1) the query would return an list of columns, which have a name and value (as well as a timestamp and ttl). 2) depends on the client library, if using python go for https://github.com/pycassa/pycassa It will return objects 3) returning millions of columns is going to be slow, would also be slow using a RDBMS. Creating millions objects in python is going to be slow. You would need to have a better idea of what queries you will actually want to run to see if it's *too* slow. If it is one approach is to store the particles at the same distance in the same column, so you need to read less columns. Again depends on how your sim works. Time complexity depends on the number of columns read. Finding a row will not be O(1) as it it may have to read from several files. Writes are more constant than reads. But remember, you can have a lot of io and cpu power in your cluster. Best advice is to jump in and see if the data model works for you at a small single node scale, most performance issues can be solved. Aaron On 12 Apr 2011, at 15:34, csharpplusproject wrote: Hi Aaron, Yes, of course it helps, I am starting to get a flavor of Cassandra -- thank you very much! First of all, by 'interactive' queries, are you referring to 'real-time' queries? (meaning, where experiments data is 'streaming', data needs to be stored and following that, the query needs to be run in real time)? Looking at the design of the particle pairs: - key: expriement_id.time_interval - column name: pair_id - column value: distance, angle, other data packed together as JSON or some other format A couple of questions: (1) Will a query such as pairID[ expriement_id.time_interval ] will basically return an array of all paidIDs for the experiment, where each item is a 'packed' JSON? (2) Would it be possible, rather than returning the whole JSON object per every pairID, to get (say) only the distance? (3) Would it be possible to easily update certain 'pairIDs' with new values (for example, update pairIDs = {2389, 93434} with new distance values)? Looking at the design of the distance CF (for example): this is VERY INTERESTING. basically you are suggesting a design that will save the actual distance between each pair of particles, and will allow queries where we can find all pairIDs (for an experiment, on time_interval) that meet a
Re: Cassandra Database Modeling
Is there a limit to the size that can be stored in one 'cell' (by 'cell' I mean the intersection between a *key* and a *data column*)? is there a limit to the size of data of one *key*? one *data column*? http://wiki.apache.org/cassandra/CassandraLimitations http://wiki.apache.org/cassandra/CassandraLimitationsThe data of cassandra are partitioned by the row key; therefore, if you want to put all pairs into the same row, you should consider the disk size. Thanks in advance for any help / guidance. -Original Message- *From*: aaron morton aa...@thelastpickle.comaaron%20morton%20%3caa...@thelastpickle.com%3e *Reply-to*: user@cassandra.apache.org *To*: user@cassandra.apache.org *Subject*: Re: Cassandra Database Modeling *Date*: Wed, 13 Apr 2011 10:14:21 +1200 Yes for interactive == real time queries. Hadoop based techniques are non time critical queries, but they do have greater analytical capabilities. particle_pairs: 1) Yes and no and sort of. Under the hood the get_slice api call will be used by your client library to pull back chunks of (ordered) columns. Most client libraries abstract away the chunking for you. 2) If you are using a packed structure like JSON then no, Cassandra will have no idea what you've put in the columns other than bytes . It really depends on how much data you have per pair, but generally it's easier to pull back more data than try to get exactly what you need. Downside is you have to update all the data. 3) No, you would need to update all the data for the pair. I was assuming most of the data was written once, and that your simulation had something like a stop-the-world phase between time slices where state was dumped and then read to start the next interval. You could either read it first, or we can come up with something else. distance_cf 1) the query would return an list of columns, which have a name and value (as well as a timestamp and ttl). 2) depends on the client library, if using python go for https://github.com/pycassa/pycassa It will return objects 3) returning millions of columns is going to be slow, would also be slow using a RDBMS. Creating millions objects in python is going to be slow. You would need to have a better idea of what queries you will actually want to run to see if it's *too* slow. If it is one approach is to store the particles at the same distance in the same column, so you need to read less columns. Again depends on how your sim works. Time complexity depends on the number of columns read. Finding a row will not be O(1) as it it may have to read from several files. Writes are more constant than reads. But remember, you can have a lot of io and cpu power in your cluster. Best advice is to jump in and see if the data model works for you at a small single node scale, most performance issues can be solved. Aaron On 12 Apr 2011, at 15:34, csharpplusproject wrote: Hi Aaron, Yes, of course it helps, I am starting to get a flavor of *Cassandra* -- thank you very much! First of all, by 'interactive' queries, are you referring to 'real-time' queries? (meaning, where experiments data is 'streaming', data needs to be stored and following that, the query needs to be run in real time)? *Looking at the design of the **particle pairs**:* - key: expriement_id.time_interval - column name: pair_id - column value: distance, angle, other data packed together as JSON or some other format *A couple of questions:* (1) Will a query such as *pairID[ *expriement_id.time_interval* ] *will basically return an array of all paidIDs for the experiment, where each item is a 'packed' JSON? (2) Would it be possible, rather than returning the whole JSON object per every pairID, to get (say) only the distance? (3) Would it be possible to easily update certain 'pairIDs' with new values (for example, update pairIDs = {2389, 93434} with new *distance* values)? *Looking at the design of the **distance CF* (for example)*:* this is VERY INTERESTING. basically you are suggesting a design that will save the actual distance between each pair of particles, and will allow queries where we can find all pairIDs (for an experiment, on time_interval) that meet a certain distance criteria. VERY, VERY INTERESTING! *A couple of questions:* (1) Will a query such as *distanceCF[ *expriement_id.time_interval* ] *will basically return an array of all '*zero_padded_distance.pair_id*' elements for the experiment? (2) In such a case, I will get (presumably) a python list where every item is a string (and I will need to process it)? (3) Given the fact that we're doing a slice on millions of columns (?), any idea how fast such an operation would be? Just to make sure I understand, is it true that in both situations, the query complexity is basically O(1) since it's simply a HASH? Thank you for all of your help! Shalom. -Original Message- *From*: aaron morton
Re: Exception on cassandra startup 0.7.4
This is a problem reading the commitlog, which is not something scrub can help with. Looks like there is bad data in /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302567818267.log. Somehow it's corrupt in a way that the checksum is ok. (Which sounds like https://issues.apache.org/jira/browse/CASSANDRA-2128 but that was fixed for 0.7.2.) Quick fix to get up and running again would be to just remove that file. (Any data in it will be missing, of course.) Longer term you should gzip it (and your system keyspace, so we get the schema too) and attach it to a ticket so we can take a closer look. On Tue, Apr 12, 2011 at 7:22 PM, Paul Lorenz plor...@gmail.com wrote: Hello, I've been running a single node cluster (0.7.4 built from the SVN tag, running on JDK 1.6.0_21 on Ubuntu 10.10) for testing purposes. After running fine for a couple of weeks, I got the error below on startup. It sounded like the error which is supposed to be fixed by the nodetool scrub command, but since I can't run the scrub command without starting up the instance, and the instance won't start, this wasn't any use. Also, I'm fairly certain that the keyspaces in this node have only been written by 0.7.4 code. Since it was just a test node, I just blew away the data directory. Had I been thinking, I would have saved it off so I could duplicate the issue. If I can provide any other information, please let me know. Thank you, Paul paul@host:~/apps/cassandra-svn/bin$ ./cassandra -f INFO 20:57:44,344 Logging initialized INFO 20:57:44,357 Heap size: 3051814912/3052863488 INFO 20:57:44,358 JNA not found. Native methods will be disabled. INFO 20:57:44,365 Loading settings from file:/home/paul/apps/cassandra-svn/conf/cassandra.yaml INFO 20:57:44,474 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 20:57:44,593 Opening /home/paul/apps/cassandra/node1/data/system/Schema-f-378 INFO 20:57:44,606 Opening /home/paul/apps/cassandra/node1/data/system/Schema-f-379 INFO 20:57:44,608 Opening /home/paul/apps/cassandra/node1/data/system/Schema-f-377 INFO 20:57:44,618 Opening /home/paul/apps/cassandra/node1/data/system/Migrations-f-377 INFO 20:57:44,620 Opening /home/paul/apps/cassandra/node1/data/system/Migrations-f-378 INFO 20:57:44,622 Opening /home/paul/apps/cassandra/node1/data/system/Migrations-f-379 INFO 20:57:44,627 Opening /home/paul/apps/cassandra/node1/data/system/LocationInfo-f-29 INFO 20:57:44,629 Opening /home/paul/apps/cassandra/node1/data/system/LocationInfo-f-30 INFO 20:57:44,631 Opening /home/paul/apps/cassandra/node1/data/system/LocationInfo-f-31 INFO 20:57:44,674 Loading schema version debf273e-631f-11e0-ac72-e700f669bcfc INFO 20:57:44,883 Opening /home/paul/apps/cassandra/node1/data/DaisyWorksKS/User-f-1 INFO 20:57:44,886 Opening /home/paul/apps/cassandra/node1/data/DaisyWorksKS/User-f-2 INFO 20:57:44,892 Opening /home/paul/apps/cassandra/node1/data/DaisyWorksTest/User-f-11 INFO 20:57:44,895 Opening /home/paul/apps/cassandra/node1/data/DaisyWorksTest/Product-f-10 INFO 20:57:44,908 Creating new commitlog segment /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302569864908.log INFO 20:57:44,916 Replaying /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302379611027.log, /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302567818267.log, /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302567841352.log, /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302567871659.log, /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302568152030.log, /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302569289258.log INFO 20:57:44,937 Finished reading /home/paul/apps/cassandra/node1/commitlog/CommitLog-1302379611027.log ERROR 20:57:44,937 Exception encountered during startup. java.io.IOError: java.io.EOFException at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:246) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:262) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:223) at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493) at java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:363) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:311) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129) at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:120) at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:380) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:253) at
Re: Cassandra Database Modeling
Steven, Thank you. You wrote: The data of cassandra are partitioned by the row key; therefore, if you want to put all pairs into the same row, you should consider the disk size Can you please explain why the disk size is / might be a problem? Thanks, Shalom. -Original Message- From: Steven Yen-Liang Su xpste...@gmail.com Reply-to: user@cassandra.apache.org To: user@cassandra.apache.org Subject: Re: Cassandra Database Modeling Date: Wed, 13 Apr 2011 12:16:00 +0800 Is there a limit to the size that can be stored in one 'cell' (by 'cell' I mean the intersection between a key and a data column)? is there a limit to the size of data of one key? one data column? http://wiki.apache.org/cassandra/CassandraLimitations The data of cassandra are partitioned by the row key; therefore, if you want to put all pairs into the same row, you should consider the disk size. Thanks in advance for any help / guidance. -Original Message- From: aaron morton aa...@thelastpickle.com Reply-to: user@cassandra.apache.org To: user@cassandra.apache.org Subject: Re: Cassandra Database Modeling Date: Wed, 13 Apr 2011 10:14:21 +1200 Yes for interactive == real time queries. Hadoop based techniques are non time critical queries, but they do have greater analytical capabilities. particle_pairs: 1) Yes and no and sort of. Under the hood the get_slice api call will be used by your client library to pull back chunks of (ordered) columns. Most client libraries abstract away the chunking for you. 2) If you are using a packed structure like JSON then no, Cassandra will have no idea what you've put in the columns other than bytes . It really depends on how much data you have per pair, but generally it's easier to pull back more data than try to get exactly what you need. Downside is you have to update all the data. 3) No, you would need to update all the data for the pair. I was assuming most of the data was written once, and that your simulation had something like a stop-the-world phase between time slices where state was dumped and then read to start the next interval. You could either read it first, or we can come up with something else. distance_cf 1) the query would return an list of columns, which have a name and value (as well as a timestamp and ttl). 2) depends on the client library, if using python go for https://github.com/pycassa/pycassa It will return objects 3) returning millions of columns is going to be slow, would also be slow using a RDBMS. Creating millions objects in python is going to be slow. You would need to have a better idea of what queries you will actually want to run to see if it's *too* slow. If it is one approach is to store the particles at the same distance in the same column, so you need to read less columns. Again depends on how your sim works. Time complexity depends on the number of columns read. Finding a row will not be O(1) as it it may have to read from several files. Writes are more constant than reads. But remember, you can have a lot of io and cpu power in your cluster. Best advice is to jump in and see if the data model works for you at a small single node scale, most performance issues can be solved. Aaron On 12 Apr 2011, at 15:34, csharpplusproject wrote: Hi Aaron, Yes, of course it helps, I am starting to get a flavor of Cassandra -- thank you very much! First of all, by 'interactive' queries, are you referring to 'real-time' queries? (meaning, where experiments data is 'streaming', data needs to be stored and following that, the query needs to be run in real time)? Looking at the design of the particle pairs: - key: expriement_id.time_interval - column name: pair_id - column value: distance, angle, other data packed together as JSON or some other format A couple of questions: (1) Will a query such as pairID[ expriement_id.time_interval ] will basically return an array of all paidIDs for the experiment, where each item is a 'packed' JSON? (2) Would it be possible, rather than returning the whole JSON object per every pairID, to get (say) only the distance? (3) Would it be possible to easily update certain 'pairIDs' with new values (for example, update pairIDs = {2389, 93434} with new distance values)?
Re: json2sstable
the data is a custom json, seems like i may have got the structure wrong. how should the import json be like? Steven Teo On 13-Apr-2011, at 10:43 AM, aaron morton wrote: Reading the code looks like it could not find a subColumns item for the row in the json file. The target CF is a super CF, is the data from a super CF ? Aaron On 13 Apr 2011, at 07:24, Steven Teo wrote: Hi, I am trying to run json2sstable with the following command but am receiving the below error. json2sstable -K testks -c testcf output.json /var/lib/cassandra/data/testks/testcf-f-1-Data.db Importing 321 keys... java.lang.NullPointerException at org.apache.cassandra.tools.SSTableImport.addColumnsToCF(SSTableImport.java:136) at org.apache.cassandra.tools.SSTableImport.addToSuperCF(SSTableImport.java:173) at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:228) at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:197) at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:421) ERROR: null anything i did wrongly here? Thanks!
Re: flush_largest_memtables_at messages in 7.4
Does it really matter how long cassandra has been running? I thought it will keep keys of 1M at least. Regarding your previous question about queue size in iostat I see it ranging from 114-300. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/flush-largest-memtables-at-messages-in-7-4-tp6266221p6267728.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: flush_largest_memtables_at messages in 7.4
Does it really matter how long cassandra has been running? I thought it will keep keys of 1M at least. It will keep up to the limit, and it will save caches periodically and reload them on start. But the cache needs to be populated by traffic first. If you wrote a bunch of data, enabled the row cache, and began reading you have to first wait for population of the cache prior to looking at cache locality. Note that the saving of caches is periodic and if you were constantly restarting nodes during testing maybe it never got saved with the full set of keys. Regarding your previous question about queue size in iostat I see it ranging from 114-300. Saturated. -- / Peter Schuller
Error while startup - latest trunk build
Hi, I am getting the following exception while starting Cassandra trunk build, am I missing any configuration options, please help ? Thanks, Shariq. Stack track ~/work/cassandra-trunk$ ./bin/cassandra -f INFO 11:04:07,864 Logging initialized INFO 11:04:07,877 Heap size: 1893728256/1893728256 INFO 11:04:07,878 JNA not found. Native methods will be disabled. INFO 11:04:07,885 Loading settings from file:/home/shariq/work/cassandra-trunk/conf/cassandra.yaml INFO 11:04:08,003 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 11:04:08,083 Global memtable threshold is enabled at 602MB INFO 11:04:08,136 reading saved cache /var/lib/cassandra/saved_caches/system-IndexInfo-KeyCache INFO 11:04:08,145 Opening /var/lib/cassandra/data/system/IndexInfo-f-5 INFO 11:04:08,163 reading saved cache /var/lib/cassandra/saved_caches/system-Schema-KeyCache INFO 11:04:08,165 Opening /var/lib/cassandra/data/system/Schema-f-57 INFO 11:04:08,169 Opening /var/lib/cassandra/data/system/Schema-f-59 INFO 11:04:08,171 Opening /var/lib/cassandra/data/system/Schema-f-58 INFO 11:04:08,176 Opening /var/lib/cassandra/data/system/Migrations-f-58 INFO 11:04:08,177 Opening /var/lib/cassandra/data/system/Migrations-f-57 INFO 11:04:08,178 Opening /var/lib/cassandra/data/system/Migrations-f-59 INFO 11:04:08,182 reading saved cache /var/lib/cassandra/saved_caches/system-LocationInfo-KeyCache INFO 11:04:08,185 Opening /var/lib/cassandra/data/system/LocationInfo-f-46 INFO 11:04:08,188 Opening /var/lib/cassandra/data/system/LocationInfo-f-47 INFO 11:04:08,191 Opening /var/lib/cassandra/data/system/LocationInfo-f-45 INFO 11:04:08,236 Loading schema version 33ac001b-60fc-11e0-8f89-e700f669bcfc ERROR 11:04:08,463 Exception encountered during startup. java.lang.RuntimeException: org.apache.cassandra.config.ConfigurationException: SimpleStrategy requires a replication_factor strategy option. at org.apache.cassandra.db.Table.init(Table.java:277) at org.apache.cassandra.db.Table.open(Table.java:109) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:160) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80) Caused by: org.apache.cassandra.config.ConfigurationException: SimpleStrategy requires a replication_factor strategy option. at org.apache.cassandra.locator.SimpleStrategy.validateOptions(SimpleStrategy.java:75) at org.apache.cassandra.locator.AbstractReplicationStrategy.createReplicationStrategy(AbstractReplicationStrategy.java:262) at org.apache.cassandra.db.Table.createReplicationStrategy(Table.java:327) at org.apache.cassandra.db.Table.init(Table.java:273) ... 4 more Exception encountered during startup. java.lang.RuntimeException: org.apache.cassandra.config.ConfigurationException: SimpleStrategy requires a replication_factor strategy option. at org.apache.cassandra.db.Table.init(Table.java:277) at org.apache.cassandra.db.Table.open(Table.java:109) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:160) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80) Caused by: org.apache.cassandra.config.ConfigurationException: SimpleStrategy requires a replication_factor strategy option. at org.apache.cassandra.locator.SimpleStrategy.validateOptions(SimpleStrategy.java:75) at org.apache.cassandra.locator.AbstractReplicationStrategy.createReplicationStrategy(AbstractReplicationStrategy.java:262) at org.apache.cassandra.db.Table.createReplicationStrategy(Table.java:327) at org.apache.cassandra.db.Table.init(Table.java:273) ... 4 more
Re: quick repair tool question
cool! and I thought I made that one up myself : ) On Apr 13, 2011, at 2:13 AM, Chris Burroughs wrote: On 04/12/2011 11:11 AM, Jonathan Colby wrote: I'm not sure if this is the kosher way to rebuild the sstable data, but it seemed to work. http://wiki.apache.org/cassandra/Operations#Handling_failure Option #3.
Re: repair never completes with finished successfully
great tips. I will investigate further with your suggestions in mind. Hopefully the problem has gone away since I pulled in fresh data on the node with problems. On Apr 13, 2011, at 3:54 AM, aaron morton wrote: Ah, unreadable rows and in the validation compaction no less. Makes a little more sense now. Anyone help with the EOF when deserializing columns ? Is the fix to run scrub or drop the sstable ? Here's a a theory, AES is trying to... 1) Create TreeRequest 's that specify a range we want to validate. 2) Send TreeRequest 's to local node and neighbour 3) Process TreeRequest by running a validation compaction (CompactionManager.doValidationCompaction in your prev stacks) 4) When both TreeRequests return back work out the differences and then stream data if needed. Perhaps step 3 is not completing because of errors like http://www.mail-archive.com/user@cassandra.apache.org/msg12196.html If the row is over multiple sstables we can skip the row in one sstable. However if it's in a single sstable PrecompactedRow will raise an IOError if there is a problem. This is not what is in the linked error stack that shows a row been skipped, just a hunch we could checkout. Do you see an IOErrors (not exceptions) in the logs or exceptions with doValidationCompaction in the stack? For a tree request on the node you start compaction on you should see these logs... 1) Waiting for repair requests... 2) One of Stored local tree or Stored remote tree depending on which returns first at DEBUG level 3) Queuing comparison If we do not have the 3rd log then we did not get a replay from either local or remote. Aaron On 13 Apr 2011, at 00:57, Jonathan Colby wrote: There is no Repair session message either. It just starts with a message like: INFO [manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723] 2011-04-10 14:00:59,051 AntiEntropyService.java (line 770) Waiting for repair requests: [#TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.46.108.101, (DFS,main), #TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.47.108.100, (DFS,main), #TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.47.108.102, (DFS,main), #TreeRequest manual-repair-2af33a51-f46a-4ba2-b1fb-ead5159dc723, /10.47.108.101, (DFS,main)] NETSTATS: Mode: Normal Not sending any streams. Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 0 150846 Responses n/a 0 443183 One node in our cluster still has unreadable rows, where the reads trip up every time for certain sstables (you've probably seen my earlier threads regarding that). My suspicion is that the bloom filter read on the node with the corrupt sstables is never reporting back to the repair, thereby causing it to hang. What would be great is a scrub tool that ignores unreadable/unserializable rows! : ) On Apr 12, 2011, at 2:15 PM, aaron morton wrote: Do you see a message starting Repair session and ending with completed successfully ? Or do you see any streaming activity using nodetool netstats Repair can hang if a neighbour dies and fails to send a requested stream. It will timeout after 24 hours (I think). Aaron On 12 Apr 2011, at 23:39, Karl Hiramoto wrote: On 12/04/2011 13:31, Jonathan Colby wrote: There are a few other threads related to problems with the nodetool repair in 0.7.4. However I'm not seeing any errors, just never getting a message that the repair completed successfully. In my production and test cluster (with just a few MB data) the repair nodetool prompt never returns and the last entry in the cassandra.log is always something like: #TreeRequest manual-repair-f739ca7a-bef8-4683-b249-09105f6719d9, /10.46.108.102, (DFS,main) completed successfully: 1 outstanding But I don't see a message, even hours later, that the 1 outstanding request finished successfully. Anyone else experience this? These are physical server nodes in local data centers and not EC2 I've seen this. To fix it try a nodetool compact then repair. -- Karl