Creating a keyspace fails
I just started with cassandra. Currently I'm reading the following tutorial about cal: http://www.datastax.com/docs/1.1/dml/using_cql#use-cql But I already fail when trying to create a keyspace: $ ./cqlsh --cql3 Connected to Test Cluster at localhost:9160. [cqlsh 2.3.0 | Cassandra 1.2.0 | CQL spec 3.0.0 | Thrift protocol 19.35.0] Use HELP for help. cqlsh CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor='1'; Bad Request: line 1:82 mismatched input ':' expecting '=' Perhaps you meant to use CQL 2? Try using the -2 option when starting cqlsh. What is wrong?
Re: Creating a keyspace fails
cqlsh CREATE KEYSPACE demodb WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}; cqlsh use demodb; cqlsh:demodb On Tue, Jan 22, 2013 at 7:04 PM, Paul van Hoven paul.van.ho...@googlemail.com wrote: CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor='1';
Re: How to store large columns?
But, this keys have the same prefix. So, they will be distributed on the same node. Right? 2013/1/21 Jason Brown jasbr...@netflix.com The reason for multiple keys (and, by extension, multiple columns) is to better distribute the write/read load across the cluster as keys will (hopefully) be distributed on different nodes. This helps to avoid hot spots. Hope this helps, -Jason Brown Netflix -- *From:* Sávio Teles [savio.te...@lupa.inf.ufg.br] *Sent:* Monday, January 21, 2013 9:51 AM *To:* user@cassandra.apache.org *Subject:* Re: How to store large columns? Astyanax split large objects into multiple keys. Is it a good idea? It is better to split into multiple columns? Thanks 2013/1/21 Sávio Teles savio.te...@lupa.inf.ufg.br Thanks Keith Wright. 2013/1/21 Keith Wright kwri...@nanigans.com This may be helpful: https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store From: Vegard Berget p...@fantasista.no Reply-To: user@cassandra.apache.org user@cassandra.apache.org, Vegard Berget p...@fantasista.no Date: Monday, January 21, 2013 8:35 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: How to store large columns? Hi, You could split it into multiple columns on the client side: RowKeyData: Part1: [1mb], Part2: [1mb], Part3: [1mb]...PartN[1mb] Now you can use multiple get() in parallell to get the files back and then join them back to one file. I _think_ maybe the new CQL3-protocol does not have the same limitation, but I have never tried large columns there, so someone with more experience than me will have to confirm this. .vegard, - Original Message - From: user@cassandra.apache.org To: user@cassandra.apache.org Cc: Sent: Mon, 21 Jan 2013 11:16:40 -0200 Subject: How to store large columns? We wish to store a column in a row with size larger thanthrift_framed_transport_size_in_mb . But, Thrift has a maximum frame size configured by thrift_framed_transport_size_in_mb in cassandra.yaml. so, How to store columns with size larger than thrift_framed_transport_size_in_mb? Increasing this value does not solve the problem, since we have columns with varying sizes. -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG
Re: How to store large columns?
Hi, No, the keys are hashed to be distributed, at least if you use RandomPartitioner.From http://www.datastax.com/docs/1.0/cluster_architecture/partitioning:To distribute the data evenly across the number of nodes, a hashing algorithm creates an MD5 hash value of the row key .vegard, - Original Message - From: user@cassandra.apache.org To: Cc: Sent:Tue, 22 Jan 2013 09:40:19 -0200 Subject:Re: How to store large columns? But, this keys have the same prefix. So, they will be distributed on the same node Right? 2013/1/21 Jason Brown The reason for multiple keys (and, by extension, multiple columns) is to better distribute the write/read load across the cluster as keys will (hopefully) be distributed on different nodes. This helps to avoid hot spots. Hope this helps, -Jason Brown Netflix - FROM: Sávio Teles [savio.te...@lupa.inf.ufg.br [2]] SENT: Monday, January 21, 2013 9:51 AM TO: user@cassandra.apache.org [3] SUBJECT: Re: How to store large columns? Astyanax split large objects into multiple keys. Is it a good idea? It is better to split into multiple columns? Thanks 2013/1/21 Sávio Teles Thanks Keith Wright. 2013/1/21 Keith Wright This may be helpful: https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store [6] From: Vegard Berget Reply-To: user@cassandra.apache.org [8] , Vegard Berget Date: Monday, January 21, 2013 8:35 AM To: user@cassandra.apache.org [11] Subject: Re: How to store large columns? Hi, You could split it into multiple columns on the client side: RowKeyData: Part1: [1mb], Part2: [1mb], Part3: [1mb]...PartN[1mb] Now you can use multiple get() in parallell to get the files back and then join them back to one file. I _think_ maybe the new CQL3-protocol does not have the same limitation, but I have never tried large columns there, so someone with more experience than me will have to confirm this. .vegard, - Original Message - From: user@cassandra.apacheorg [13] To: Cc: Sent: Mon, 21 Jan 2013 11:16:40 -0200 Subject: How to store large columns? We wish to store a column in a row with size larger than thrift_framed_transport_size_in_mb. But, Thrift has a maximum frame size configured by thrift_framed_transport_size_in_mb in cassandra.yaml. so, How to store columns with size larger than thrift_framed_transport_size_in_mb? Increasing this value does not solve the problem, since we have columns with varying sizes. -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles [15] Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles [16] Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles [17] Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles [18] Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG Links: -- [1] mailto:jasbr...@netflix.com [2] mailto:savio.te...@lupa.inf.ufg.br [3] mailto:user@cassandra.apache.org [4] mailto:savio.te...@lupa.inf.ufg.br [5] mailto:kwri...@nanigans.com [6] https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store [7] mailto:p...@fantasista.no [8] mailto:user@cassandra.apache.org [9] mailto:user@cassandra.apache.org [10] mailto:p...@fantasista.no [11] mailto:user@cassandra.apache.org [12] mailto:user@cassandra.apache.org [13] mailto:user@cassandra.apache.org [14] mailto:user@cassandra.apache.org [15] http://br.linkedin.com/in/savioteles [16] http://br.linkedin.com/in/savioteles [17] http://br.linkedin.com/in/savioteles [18] http://br.linkedin.com/in/savioteles
Re: Creating a keyspace fails
Okay, that worked. Why is the statement from the tutorial wrong. I mean, why would a company like datastax post somthing like this? 2013/1/22 Jason Wee peich...@gmail.com: cqlsh CREATE KEYSPACE demodb WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}; cqlsh use demodb; cqlsh:demodb On Tue, Jan 22, 2013 at 7:04 PM, Paul van Hoven paul.van.ho...@googlemail.com wrote: CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor='1';
Re: How to store large columns?
You're right Vegard! Thanks 2013/1/22 Vegard Berget p...@fantasista.no Hi, No, the keys are hashed to be distributed, at least if you use RandomPartitioner. From http://www.datastax.com/docs/1.0/cluster_architecture/partitioning: To distribute the data evenly across the number of nodes, a hashing algorithm creates an MD5 hash value of the row key .vegard, - Original Message - From: user@cassandra.apache.org To: user@cassandra.apache.org Cc: Sent: Tue, 22 Jan 2013 09:40:19 -0200 Subject: Re: How to store large columns? But, this keys have the same prefix. So, they will be distributed on the same node. Right? 2013/1/21 Jason Brown jasbr...@netflix.com The reason for multiple keys (and, by extension, multiple columns) is to better distribute the write/read load across the cluster as keys will (hopefully) be distributed on different nodes. This helps to avoid hot spots. Hope this helps, -Jason Brown Netflix -- *From:* Sávio Teles [savio.te...@lupa.inf.ufg.brsavio.te...@lupa.inf.ufgbr ] *Sent:* Monday, January 21, 2013 9:51 AM *To:* user@cassandra.apache.org *Subject:* Re: How to store large columns? Astyanax split large objects into multiple keys. Is it a good idea? It is better to split into multiple columns? Thanks 2013/1/21 Sávio Teles savio.te...@lupa.inf.ufg.br Thanks Keith Wright. 2013/1/21 Keith Wright kwri...@nanigans.com This may be helpful: https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store From: Vegard Berget p...@fantasista.no Reply-To: user@cassandra.apache.org user@cassandra.apache.org, Vegard Berget p...@fantasista.no Date: Monday, January 21, 2013 8:35 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: How to store large columns? Hi, You could split it into multiple columns on the client side: RowKeyData: Part1: [1mb], Part2: [1mb], Part3: [1mb]...PartN[1mb] Now you can use multiple get() in parallell to get the files back and then join them back to one file. I _think_ maybe the new CQL3-protocol does not have the same limitation, but I have never tried large columns there, so someone with more experience than me will have to confirm this. .vegard, - Original Message - From: user@cassandra.apache.org To: user@cassandra.apache.org Cc: Sent: Mon, 21 Jan 2013 11:16:40 -0200 Subject: How to store large columns? We wish to store a column in a row with size larger thanthrift_framed_transport_size_in_mb . But, Thrift has a maximum frame size configured by thrift_framed_transport_size_in_mb in cassandra.yaml. so, How to store columns with size larger than thrift_framed_transport_size_in_mb? Increasing this value does not solve the problem, since we have columns with varying sizes. -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG
Re: Creating a keyspace fails
maybe typo or forget to update the doc... but anyway, you can use the help command when you are in cqlsh.. for example: cqlsh HELP CREATE_KEYSPACE; CREATE KEYSPACE ksname WITH replication = {'class':'strategy' [,'option':val]}; On Tue, Jan 22, 2013 at 8:06 PM, Paul van Hoven paul.van.ho...@googlemail.com wrote: Okay, that worked. Why is the statement from the tutorial wrong. I mean, why would a company like datastax post somthing like this? 2013/1/22 Jason Wee peich...@gmail.com: cqlsh CREATE KEYSPACE demodb WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}; cqlsh use demodb; cqlsh:demodb On Tue, Jan 22, 2013 at 7:04 PM, Paul van Hoven paul.van.ho...@googlemail.com wrote: CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor='1';
node down = log explosion?
I have Cassandra 1.1.7 cluster with 4 nodes in 2 datacenters (2+2). Replication is configured as DC1:2,DC2:2 (i.e. every node holds the entire data). I am load-testing counter increments at the rate of about 10k per second. All writes are directed to two nodes in DC1 (DC2 nodes are basically backup). In total there's 100 separate clients executing 1-2 batch updates per second. We wanted to test what happens if one node goes down, so we brought one node down in DC1 (i.e. the node that was handling half of the incoming writes). This led to a complete explosion of logs on the remaining alive node in DC1. There are hundreds of megabytes of logs within an hour all basically saying the same thing: ERROR [ReplicateOnWriteStage:5653390] 2013-01-22 12:44:33,611 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[ReplicateOnWriteStage:5653390,5,main] java.lang.RuntimeException: java.util.concurrent.TimeoutException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1275) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.util.concurrent.TimeoutException at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:311) at org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:585) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1271) ... 3 more The logs are completely swamped with this and are thus unusable. Of course logs should report errors, but we don't need hundred of megabytes of this :) Is there anything that can be done to reduce the amount of this spam? In addition to making logs unusable I strongly suspect this spam makes server unable to accept as many increments as it otherwise could. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/node-down-log-explosion-tp7584932.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Creating a keyspace fails
Alright. Thanks for you quick help. :) 2013/1/22 Jason Wee peich...@gmail.com: maybe typo or forget to update the doc... but anyway, you can use the help command when you are in cqlsh.. for example: cqlsh HELP CREATE_KEYSPACE; CREATE KEYSPACE ksname WITH replication = {'class':'strategy' [,'option':val]}; On Tue, Jan 22, 2013 at 8:06 PM, Paul van Hoven paul.van.ho...@googlemail.com wrote: Okay, that worked. Why is the statement from the tutorial wrong. I mean, why would a company like datastax post somthing like this? 2013/1/22 Jason Wee peich...@gmail.com: cqlsh CREATE KEYSPACE demodb WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}; cqlsh use demodb; cqlsh:demodb On Tue, Jan 22, 2013 at 7:04 PM, Paul van Hoven paul.van.ho...@googlemail.com wrote: CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor='1';
Is this how to read the output of nodetool cfhistograms?
The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together. Using this example output should I read it as: my reads all took either 1 or 2 sstable. And separately, I had write latencies of 3,7,19. And separately I had read latencies of 2, 8,69, etc? In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count. Is that right? Offset SSTables Write Latency Read Latency Row Size Column Count 1 16033 00 0 0 2303 00 0 1 3 0 00 0 0 4 0 00 0 0 5 0 00 0 0 6 0 00 0 0 7 0 00 0 0 8 0 02 0 0 10 0 00 0 6261 12 0 02 0 117 14 0 08 0 0 17 0 3 69 0 255 20 0 7 163 0 0 24 019 1369 0 0
Re: Is this how to read the output of nodetool cfhistograms?
On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com wrote: The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together. Using this example output should I read it as: my reads all took either 1 or 2 sstable. And separately, I had write latencies of 3,7,19. And separately I had read latencies of 2, 8,69, etc? In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count. Is that right? Correct. A number in any of the metric columns is a count value bucketed in the offset on that row. There are no relationships between other columns on the same row. So your first row says 16033 reads were satisfied by 1 sstable. The other metrics (for example, latency of these reads) is reflected in the histogram under Read Latency, under various other bucketed offsets. Offset SSTables Write Latency Read Latency Row Size Column Count 1 16033 00 0 0 2303 00 0 1 3 0 00 0 0 4 0 00 0 0 5 0 00 0 0 6 0 00 0 0 7 0 00 0 0 8 0 02 0 0 10 0 00 0 6261 12 0 02 0 117 14 0 08 0 0 17 0 3 69 0 255 20 0 7 163 0 0 24 019 1369 0 0
Re: Is this how to read the output of nodetool cfhistograms?
Thank you! Since this is a very non-standard way to display data it might be worth a better explanation in the various online documentation sets. Thank you again. Brian On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib mina.nag...@adgear.com wrote: On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com wrote: The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together. Using this example output should I read it as: my reads all took either 1 or 2 sstable. And separately, I had write latencies of 3,7,19. And separately I had read latencies of 2, 8,69, etc? In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count. Is that right? Correct. A number in any of the metric columns is a count value bucketed in the offset on that row. There are no relationships between other columns on the same row. So your first row says 16033 reads were satisfied by 1 sstable. The other metrics (for example, latency of these reads) is reflected in the histogram under Read Latency, under various other bucketed offsets. Offset SSTables Write Latency Read Latency Row Size Column Count 1 16033 00 0 0 2303 00 0 1 3 0 00 0 0 4 0 00 0 0 5 0 00 0 0 6 0 00 0 0 7 0 00 0 0 8 0 02 0 0 10 0 00 0 6261 12 0 02 0 117 14 0 08 0 0 17 0 3 69 0 255 20 0 7 163 0 0 24 019 1369 0 0
Re: Is this how to read the output of nodetool cfhistograms?
This was described in good detail here: http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox tar...@cabotresearch.comwrote: Thank you! Since this is a very non-standard way to display data it might be worth a better explanation in the various online documentation sets. Thank you again. Brian On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib mina.nag...@adgear.comwrote: On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com wrote: The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together. Using this example output should I read it as: my reads all took either 1 or 2 sstable. And separately, I had write latencies of 3,7,19. And separately I had read latencies of 2, 8,69, etc? In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count. Is that right? Correct. A number in any of the metric columns is a count value bucketed in the offset on that row. There are no relationships between other columns on the same row. So your first row says 16033 reads were satisfied by 1 sstable. The other metrics (for example, latency of these reads) is reflected in the histogram under Read Latency, under various other bucketed offsets. Offset SSTables Write Latency Read Latency Row Size Column Count 1 16033 00 0 0 2303 00 0 1 3 0 00 0 0 4 0 00 0 0 5 0 00 0 0 6 0 00 0 0 7 0 00 0 0 8 0 02 0 0 10 0 00 0 6261 12 0 02 0 117 14 0 08 0 0 17 0 3 69 0 255 20 0 7 163 0 0 24 019 1369 0 0
Re: Is this how to read the output of nodetool cfhistograms?
Indeed, but how many Cassandra users have the good fortune to stumble across that page? Just saying that the explanation of the very powerful nodetool commands should be more front and center. Brian On Tue, Jan 22, 2013 at 10:03 AM, Edward Capriolo edlinuxg...@gmail.comwrote: This was described in good detail here: http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox tar...@cabotresearch.comwrote: Thank you! Since this is a very non-standard way to display data it might be worth a better explanation in the various online documentation sets. Thank you again. Brian On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib mina.nag...@adgear.comwrote: On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com wrote: The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together. Using this example output should I read it as: my reads all took either 1 or 2 sstable. And separately, I had write latencies of 3,7,19. And separately I had read latencies of 2, 8,69, etc? In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count. Is that right? Correct. A number in any of the metric columns is a count value bucketed in the offset on that row. There are no relationships between other columns on the same row. So your first row says 16033 reads were satisfied by 1 sstable. The other metrics (for example, latency of these reads) is reflected in the histogram under Read Latency, under various other bucketed offsets. Offset SSTables Write Latency Read Latency Row Size Column Count 1 16033 00 0 0 2303 00 0 1 3 0 00 0 0 4 0 00 0 0 5 0 00 0 0 6 0 00 0 0 7 0 00 0 0 8 0 02 0 0 10 0 00 0 6261 12 0 02 0 117 14 0 08 0 0 17 0 3 69 0 255 20 0 7 163 0 0 24 019 1369 0 0
Re: Creating a keyspace fails
You were most likely looking at the wrong documentation. The syntax for CQL3 changed between Cassandra 1.1 and 1.2. When I google cassandra CQL3 the first result is Cassandra 1.1 documentation about CQL3, which is wrong for 1.2. Make sure you are looking at the documentation for the version you are using. It might also be nice for DataStax to update the 1.1 documentation with a warning. -- *Colin Blower* On 01/22/2013 04:06 AM, Paul van Hoven wrote: Okay, that worked. Why is the statement from the tutorial wrong. I mean, why would a company like datastax post somthing like this? 2013/1/22 Jason Wee peich...@gmail.com: cqlsh CREATE KEYSPACE demodb WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}; cqlsh use demodb; cqlsh:demodb On Tue, Jan 22, 2013 at 7:04 PM, Paul van Hoven paul.van.ho...@googlemail.com wrote: CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor='1';
Cassandra source code explained
Hi everyone, I am looking for any places where the Cassandra source code structure would be explained. Are there any articles / wiki available? Kind regards, Radek Gruchalski radek.gruchal...@technicolor.com (mailto:radek.gruchal...@technicolor.com) | radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) | ra...@gruchalski.com (mailto:ra...@gruchalski.com) Confidentiality: This communication is intended for the above-named person and may be confidential and/or legally privileged. If it has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender immediately.
Re: Cassandra source code explained
http://wiki.apache.org/cassandra/ArchitectureInternals From: Radek Gruchalski radek.gruchal...@portico.iomailto:radek.gruchal...@portico.io Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, January 22, 2013 9:07 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Cassandra source code explained Hi everyone, I am looking for any places where the Cassandra source code structure would be explained. Are there any articles / wiki available? Kind regards,? Radek Gruchalski radek.gruchal...@technicolor.commailto:radek.gruchal...@technicolor.com | radek.gruchal...@portico.iomailto:radek.gruchal...@portico.io | ?ra...@gruchalski.com?mailto:ra...@gruchalski.com Confidentiality: This communication is intended for the above-named person and may be confidential and/or legally privileged. If it has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender immediately.
Re: Cassandra source code explained
Thank you. I found this but was hoping that there's anything broader out there. This will have to be enough. Kind regards, Radek Gruchalski radek.gruchal...@technicolor.com (mailto:radek.gruchal...@technicolor.com) | radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) | ra...@gruchalski.com (mailto:ra...@gruchalski.com) 00447889948663 Confidentiality: This communication is intended for the above-named person and may be confidential and/or legally privileged. If it has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender immediately. On Tuesday, 22 January 2013 at 18:08, Michael Kjellman wrote: http://wiki.apache.org/cassandra/ArchitectureInternals From: Radek Gruchalski radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) Reply-To: user@cassandra.apache.org (mailto:user@cassandra.apache.org) user@cassandra.apache.org (mailto:user@cassandra.apache.org) Date: Tuesday, January 22, 2013 9:07 AM To: user@cassandra.apache.org (mailto:user@cassandra.apache.org) user@cassandra.apache.org (mailto:user@cassandra.apache.org) Subject: Cassandra source code explained Hi everyone, I am looking for any places where the Cassandra source code structure would be explained. Are there any articles / wiki available? Kind regards,? Radek Gruchalski radek.gruchal...@technicolor.com (mailto:radek.gruchal...@technicolor.com) | radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) | ?ra...@gruchalski.com? (mailto:ra...@gruchalski.com) Confidentiality: This communication is intended for the above-named person and may be confidential and/or legally privileged. If it has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender immediately.
Re: Is this how to read the output of nodetool cfhistograms?
I agree that Cassandra cfhistograms is probably the most bizarre metrics I have ever come across although it's extremely useful. I believe the offset is actually the metrics it has tracked (x-axis on the traditional histogram) and the number under each column is how many times that value has been recorded (y-axis on the traditional histogram). Your write latency are 17, 20, 24 (microseconds?). 3 writes took 17, 7 writes took 20 and 19 writes took 24 Correct me if I am wrong. Thanks. -Wei From: Brian Tarbox tar...@cabotresearch.com To: user@cassandra.apache.org Sent: Tuesday, January 22, 2013 7:27 AM Subject: Re: Is this how to read the output of nodetool cfhistograms? Indeed, but how many Cassandra users have the good fortune to stumble across that page? Just saying that the explanation of the very powerful nodetool commands should be more front and center. Brian On Tue, Jan 22, 2013 at 10:03 AM, Edward Capriolo edlinuxg...@gmail.com wrote: This was described in good detail here: http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox tar...@cabotresearch.com wrote: Thank you! Since this is a very non-standard way to display data it might be worth a better explanation in the various online documentation sets. Thank you again. Brian On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib mina.nag...@adgear.com wrote: On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com wrote: The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together. Using this example output should I read it as: my reads all took either 1 or 2 sstable. And separately, I had write latencies of 3,7,19. And separately I had read latencies of 2, 8,69, etc? In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count. Is that right? Correct. A number in any of the metric columns is a count value bucketed in the offset on that row. There are no relationships between other columns on the same row. So your first row says 16033 reads were satisfied by 1 sstable. The other metrics (for example, latency of these reads) is reflected in the histogram under Read Latency, under various other bucketed offsets. Offset SSTables Write Latency Read Latency Row Size Column Count 1 16033 0 0 0 0 2 303 0 0 0 1 3 0 0 0 0 0 4 0 0 0 0 0 5 0 0 0 0 0 6 0 0 0 0 0 7 0 0 0 0 0 8 0 0 2 0 0 10 0 0 0 0 6261 12 0 0 2 0 117 14 0 0 8 0 0 17 0 3 69 0 255 20 0 7 163 0 0 24 0 19 1369 0 0
Re: Creating a keyspace fails
I sent a note to our docs team to add a warning/note to the docs there about the difference between the syntax in 1.1 and 1.2. Thanks! On Tue, Jan 22, 2013 at 10:49 AM, Colin Blower cblo...@barracuda.comwrote: You were most likely looking at the wrong documentation. The syntax for CQL3 changed between Cassandra 1.1 and 1.2. When I google cassandra CQL3 the first result is Cassandra 1.1 documentation about CQL3, which is wrong for 1.2. Make sure you are looking at the documentation for the version you are using. It might also be nice for DataStax to update the 1.1 documentation with a warning. -- *Colin Blower* On 01/22/2013 04:06 AM, Paul van Hoven wrote: Okay, that worked. Why is the statement from the tutorial wrong. I mean, why would a company like datastax post somthing like this? 2013/1/22 Jason Wee peich...@gmail.com peich...@gmail.com: cqlsh CREATE KEYSPACE demodb WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}; cqlsh use demodb; cqlsh:demodb On Tue, Jan 22, 2013 at 7:04 PM, Paul van Hovenpaul.van.ho...@googlemail.com paul.van.ho...@googlemail.com wrote: CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor='1'; -- Tyler Hobbs DataStax http://datastax.com/
Re: Cassandra timeout whereas it is not much busy
I have seen logs about that. I didn't worry much, since the GC of the jvm was not under pressure. When cassandra logs a ParNew event from the GCInspector that is time the server is paused / frozen. CMS events have a very small pause, but they are taking a non trivial amount of CPU time. If you are logging a log of GC events you should look into it. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/01/2013, at 3:28 AM, Nicolas Lalevée nicolas.lale...@hibnet.org wrote: Le 17 janv. 2013 à 05:00, aaron morton aa...@thelastpickle.com a écrit : Check the disk utilisation using iostat -x 5 If you are on a VM / in the cloud check for CPU steal. Check the logs for messages from the GCInspector, the ParNew events are times the JVM is paused. I have seen logs about that. I didn't worry much, since the GC of the jvm was not under pressure. As far as I understand, unless a CF is continuously flushed, it should not be a major issue, isn't it ? I don't know for sure if there was a lot of flush though, since my nodes were not properly monitored. Look at the times dropped messages are logged and try to correlate them with other server events. I tried that with not much success. I have graphs on cacti though, so this is quite hard to visualize when things happen simultaneously on several graphs. If you have a lot secondary indexes, or a lot of memtables flushing at the some time you may be blocking behind the global Switch Lock. If you use secondary indexes make sure the memtable_flush_queue_size is set correctly, see the comments in the yaml file. I have no secondary indexes. If you have a lot of CF's flushing at the same time, and there are not messages from the MeteredFlusher, it may be the log segment is too big for the number of CF's you have. When the segment needs to be recycled all dirty CF's are flushed, if you have a lot of cf's this can result in blocking around the switch lock. Trying reducing the commitlog_segment_size_in_mb so that less CF's are flushed. What is a lot ? We have 26 CF. 9 are barely used. 15 contains time series data (cassandra rocks with them) in which only 3 of them have from 1 to 10 read or writes per sec. 1 quite hot (200read/s) which is mainly used for its bloom filter (which disksize is about 1G). And 1 also hot used only for writes (which has the same big bloom filter, which I am about to remove since it is useless). BTW, thanks for the pointers. I have not tried yet to put our nodes under pressure. But when I'll do, I'll look at those pointers closely. Nicolas Hope that helps - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/01/2013, at 10:30 AM, Nicolas Lalevée nicolas.lale...@hibnet.org wrote: Hi, I have a strange behavior I am not able to understand. I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a replication factor of 3. --- my story is maybe too long, trying shorter here, while saving what I wrote in case someone has patience to read my bad english ;) I got under a situation where my cluster was generating a lot of timeouts on our frontend, whereas I could not see any major trouble on the internal stats. Actually cpu, read write counts on the column families were quite low. A mess until I switched from java7 to java6 and forced the used of jamm. After the switch, cpu, read write counts, were going up again, timeouts gone. I have seen this behavior while reducing the xmx too. What could be blocking cassandra from utilizing the while resources of the machine ? Is there is metrics I didn't saw which could explain this ? --- Here is the long story. When I first set my cluster up, I gave blindly 6G of heap to the cassandra nodes, thinking that more a java process has, the smoother it runs, while keeping some RAM to the disk cache. We got some new feature deployed, and things were going into hell, some machine up to 60% of wa. I give credit to cassandra because there was not that much timeout received on the web frontend, it was kind of slow but is was kind of working. With some optimizations, we reduced the pressure of the new feature, but it was still at 40%wa. At that time I didn't have much monitoring, just heap and cpu. I read some article how to tune, and I learned that the disk cache is quite important because cassandra relies on it to be the read cache. So I have tried many xmx, and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I have set 3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with that, I changed the xmx 3,3G on each node. But then things really went to hell, a lot of timeouts on the frontend. It was not working at all. So I rolled back. After some time,
Re: Concurrent write performance
Background see my talk here http://www.datastax.com/events/cassandrasummit2012/presentations Mutations to a row are isolated. In practice this means that simultaneous writes to the same row are possible, however the first write thread to complete wins and the other threads start their work again. So if you have one very hot row you will see less throughput as the writers will have to do some re-work. I did a little test here http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance on slide 30. sort the columns at the same time? Don;t worry about sorting. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/01/2013, at 4:40 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: Do you experience any performance problems? This will be the last thing to look at. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider Take a ride with Adform's Rich Media Suite signature-logo7f56.png signature-best-employer-logo42f0.png Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: Jay Svc [mailto:jaytechg...@gmail.com] Sent: Monday, January 21, 2013 17:28 To: user@cassandra.apache.org Subject: Concurrent write performance Folks, I would like to write(insert or update) to a single row in a column family. I have concurrent requests which will write to a single row. Do we see any performance implications because of concurrent writes to a single row where comparator has to sort the columns at the same time? Please share your thoughts. Thanks, Jay
Re: sstable2json had random behavior
William, If the solution from Binh works for you can you please submit a ticket to https://issues.apache.org/jira/browse/CASSANDRA The error message could be better if that is the case. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/01/2013, at 9:16 AM, Binh Nguyen binhn...@gmail.com wrote: Hi William, I also saw this one before but it always happened in my case when I have only Data and Index files. The problem goes away when I have all another files (Compression, Filter...) On Mon, Jan 21, 2013 at 11:36 AM, William Oberman ober...@civicscience.com wrote: I'm running 1.1.6 from the datastax repo. I ran sstable2json and got the following error: Exception in thread main java.io.IOError: java.io.IOException: dataSize of 7020023552240793698 starting at 993981393 would be larger than file /var/lib/cassandra/data/X-Data.db length 7502161255 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:156) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:86) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:70) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:187) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:151) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:143) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:309) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:340) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:353) at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:418) Caused by: java.io.IOException: dataSize of 7020023552240793698 starting at 993981393 would be larger than file /var/lib/cassandra/data/X-Data.db length 7502161255 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:115) ... 9 more I ran it again, and didn't. This makes me worried :-) Does anyone else ever see this class of error, and does it ever disappear for them?
Re: Cassandra timeout whereas it is not much busy
On Wed, Jan 16, 2013 at 1:30 PM, Nicolas Lalevée nicolas.lale...@hibnet.org wrote: Here is the long story. After some long useless staring at the monitoring graphs, I gave a try to using the openjdk 6b24 rather than openjdk 7u9 OpenJDK 6 and 7 are both counter-recommended with regards to Cassandra. I've heard reports of mysterious behavior like the behavior you describe, when using OpenJDK 7. Try using the Sun/Oracle JVM? Is your JNA working? =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: Cassandra pending compaction tasks keeps increasing
Thanks Aaron and Jim for your reply. The data import is done. We have about 135G on each node and it's about 28K SStables. For normal operation, we only have about 90 writes per seconds, but when I ran nodetool compationstats, it remains at 9 and hardly changes. I guess it's just an estimated number. When I ran histogram, Offset SSTables Write Latency Read Latency Row Size Column Count 1 2644 0 0 0 18660057 2 8204 0 0 0 9824270 3 11198 0 0 0 6968475 4 4269 6 0 0 5510745 5 517 29 0 0 4595205 You can see about half of the reads result in 3 SSTables. Majority of read latency are under 5ms, only a dozen are over 10ms. We haven't fully turn on reads yet, only 60 reads per second. We see about 20 read timeout during the past 12 hours. Not a single warning from Cassandra Log. Is it normal for Cassandra to timeout some requests? We set rpc timeout to be 1s, it shouldn't time out any of them? Thanks. -Wei From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Monday, January 21, 2013 12:21 AM Subject: Re: Cassandra pending compaction tasks keeps increasing The main guarantee LCS gives you is that most reads will only touch 1 row http://www.datastax.com/dev/blog/when-to-use-leveled-compaction If compaction is falling behind this may not hold. nodetool cfhistograms tells you how many SSTables were read from for reads. It's a recent histogram that resets each time you read from it. Also, parallel levelled compaction in 1.2 http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 20/01/2013, at 7:49 AM, Jim Cistaro jcist...@netflix.com wrote: 1) In addition to iostat, dstat is a good tool to see wht kind of disck throuput your are getting. That would be one thing to monitor. 2) For LCS, we also see pending compactions skyrocket. During load, LCS will create a lot of small sstables which will queue up for compaction. 3) For us the biggest concern is not how high the pending count gets, but how often it gets back down near zero. If your load is something you can do in segments or pause, then you can see how fast the cluster recovers on the compactions. 4) One thing which we tune per cluster is the size of the files. Increasing this from 5MB can sometimes improve things. But I forget if we have ever changed this after starting data load. Is your cluster receiving read traffic during this data migration? If so, I would say that read latency is your best measure. If the high number of SSTables waiting to compact is not hurting your reads, then you are probably ok. Since you are on SSD, there is a good chance the compactions are not hurting you. As for compactionthroughput, we set ours high for SSD. You usually wont use it all because the compactions are usually single threaded. Dstat will help you measure this. I hope this helps, jc From: Wei Zhu wz1...@yahoo.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org, Wei Zhu wz1...@yahoo.com Date: Friday, January 18, 2013 12:10 PM To: Cassandr usergroup user@cassandra.apache.org Subject: Cassandra pending compaction tasks keeps increasing Hi, When I run nodetool compactionstats I see the number of pending tasks keep going up steadily. I tried to increase the compactionthroughput, by using nodetool setcompactionthroughput I even tried the extreme to set it to 0 to disable the throttling. I checked iostats and we have SSD for data, the disk util is less than 5% which means it's not I/O bound, CPU is also less than 10% We are using levelcompaction and in the process of migrating data. We have 4500 writes per second and very few reads. We have about 70G data now and will grow to 150G when the migration finishes. We only have one CF and right now the number of SSTable is around 15000, write latency is still under 0.1ms. Anything needs to be concerned? Or anything I can do to reduce the number of pending compaction? Thanks. -Wei
Re: sstable2json had random behavior
No, I have the other files unfortunately and I had it fail once and succeed every time after. I'm tracking the external information of sstable2json more carefully now (exit status, stdout, stderr), so hopefully if it happens again I can be more help. will On Tue, Jan 22, 2013 at 3:38 PM, aaron morton aa...@thelastpickle.comwrote: William, If the solution from Binh works for you can you please submit a ticket to https://issues.apache.org/jira/browse/CASSANDRA The error message could be better if that is the case. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 22/01/2013, at 9:16 AM, Binh Nguyen binhn...@gmail.com wrote: Hi William, I also saw this one before but it always happened in my case when I have only Data and Index files. The problem goes away when I have all another files (Compression, Filter...) On Mon, Jan 21, 2013 at 11:36 AM, William Oberman ober...@civicscience.com wrote: I'm running 1.1.6 from the datastax repo. I ran sstable2json and got the following error: Exception in thread main java.io.IOError: java.io.IOException: dataSize of 7020023552240793698 starting at 993981393 would be larger than file /var/lib/cassandra/data/X-Data.db length 7502161255 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:156) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:86) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:70) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:187) at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:151) at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:143) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:309) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:340) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:353) at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:418) Caused by: java.io.IOException: dataSize of 7020023552240793698 starting at 993981393 would be larger than file /var/lib/cassandra/data/X-Data.db length 7502161255 at org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:115) ... 9 more I ran it again, and didn't. This makes me worried :-) Does anyone else ever see this class of error, and does it ever disappear for them?
Re: node down = log explosion?
On Tue, Jan 22, 2013 at 5:03 AM, Sergey Olefir solf.li...@gmail.com wrote: I am load-testing counter increments at the rate of about 10k per second. Do you need highly performant counters that count accurately, without meaningful chance of over-count? If so, Cassandra's counters are probably not ideal. We wanted to test what happens if one node goes down, so we brought one node down in DC1 (i.e. the node that was handling half of the incoming writes). ... This led to a complete explosion of logs on the remaining alive node in DC1. I agree, this level of exception logging during replicateOnWrite (which is called every time a counter is incremented) seems like a bug. I would file a bug at the Apache JIRA. =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: Cassandra pending compaction tasks keeps increasing
What version are you using? Are you seeing any compaction related assertions in the logs? Might be https://issues.apache.org/jira/browse/CASSANDRA-4411 We experienced this problem of the count only decreasing to a certain number and then stopping. If you are idle, it should go to 0. I have not seen it overestimate for zero, only for non-zero amounts. As for timeouts etc, you will need to look at things like nodetool tpstats to see if you have pending transactions queueing up. Jc From: Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org, Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com Date: Tuesday, January 22, 2013 12:56 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Cassandra pending compaction tasks keeps increasing Thanks Aaron and Jim for your reply. The data import is done. We have about 135G on each node and it's about 28K SStables. For normal operation, we only have about 90 writes per seconds, but when I ran nodetool compationstats, it remains at 9 and hardly changes. I guess it's just an estimated number. When I ran histogram, Offset SSTables Write Latency Read Latency Row Size Column Count 1 2644 0 0 0 18660057 2 8204 0 0 0 9824270 3 11198 0 0 0 6968475 4 4269 6 0 0 5510745 551729 0 0 4595205 You can see about half of the reads result in 3 SSTables. Majority of read latency are under 5ms, only a dozen are over 10ms. We haven't fully turn on reads yet, only 60 reads per second. We see about 20 read timeout during the past 12 hours. Not a single warning from Cassandra Log. Is it normal for Cassandra to timeout some requests? We set rpc timeout to be 1s, it shouldn't time out any of them? Thanks. -Wei From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Sent: Monday, January 21, 2013 12:21 AM Subject: Re: Cassandra pending compaction tasks keeps increasing The main guarantee LCS gives you is that most reads will only touch 1 row http://www.datastax.com/dev/blog/when-to-use-leveled-compaction If compaction is falling behind this may not hold. nodetool cfhistograms tells you how many SSTables were read from for reads. It's a recent histogram that resets each time you read from it. Also, parallel levelled compaction in 1.2 http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 20/01/2013, at 7:49 AM, Jim Cistaro jcist...@netflix.commailto:jcist...@netflix.com wrote: 1) In addition to iostat, dstat is a good tool to see wht kind of disck throuput your are getting. That would be one thing to monitor. 2) For LCS, we also see pending compactions skyrocket. During load, LCS will create a lot of small sstables which will queue up for compaction. 3) For us the biggest concern is not how high the pending count gets, but how often it gets back down near zero. If your load is something you can do in segments or pause, then you can see how fast the cluster recovers on the compactions. 4) One thing which we tune per cluster is the size of the files. Increasing this from 5MB can sometimes improve things. But I forget if we have ever changed this after starting data load. Is your cluster receiving read traffic during this data migration? If so, I would say that read latency is your best measure. If the high number of SSTables waiting to compact is not hurting your reads, then you are probably ok. Since you are on SSD, there is a good chance the compactions are not hurting you. As for compactionthroughput, we set ours high for SSD. You usually wont use it all because the compactions are usually single threaded. Dstat will help you measure this. I hope this helps, jc From: Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org, Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com Date: Friday, January 18, 2013 12:10 PM To: Cassandr usergroup user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Cassandra pending compaction tasks keeps increasing Hi, When I run nodetool compactionstats I see the number of pending tasks keep going up steadily. I tried to increase the
Re: node down = log explosion?
Do you have a suggestion as to what could be a better fit for counters? Something that can also replicate across DCs and survive link breakdown between nodes (across DCs)? (and no, I don't need 100.00% precision (although it would be nice obviously), I just need to be pretty close for the values of pretty) On the subject of bug report -- I probably will -- but I'll wait a bit for more info here, perhaps there's some configuration or something that I just don't know about. Rob Coli wrote On Tue, Jan 22, 2013 at 5:03 AM, Sergey Olefir lt; solf.lists@ gt; wrote: I am load-testing counter increments at the rate of about 10k per second. Do you need highly performant counters that count accurately, without meaningful chance of over-count? If so, Cassandra's counters are probably not ideal. We wanted to test what happens if one node goes down, so we brought one node down in DC1 (i.e. the node that was handling half of the incoming writes). ... This led to a complete explosion of logs on the remaining alive node in DC1. I agree, this level of exception logging during replicateOnWrite (which is called every time a counter is incremented) seems like a bug. I would file a bug at the Apache JIRA. =Rob -- =Robert Coli AIMGTALK - rcoli@ YAHOO - rcoli.palominob SKYPE - rcoli_palominodb -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/node-down-log-explosion-tp7584932p7584954.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: node down = log explosion?
On Tue, Jan 22, 2013 at 2:57 PM, Sergey Olefir solf.li...@gmail.com wrote: Do you have a suggestion as to what could be a better fit for counters? Something that can also replicate across DCs and survive link breakdown between nodes (across DCs)? (and no, I don't need 100.00% precision (although it would be nice obviously), I just need to be pretty close for the values of pretty) In that case, Cassandra counters are probably fine. On the subject of bug report -- I probably will -- but I'll wait a bit for more info here, perhaps there's some configuration or something that I just don't know about. Excepting on replicateOnWrite stage seems pretty unambiguous to me, and unexpected. YMMV? =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: Cassandra source code explained
On Wed 23 Jan 2013 01:10:58 AM CST, Radek Gruchalski wrote: Thank you. I found this but was hoping that there's anything broader out there. This will have to be enough. Kind regards, Radek Gruchalski radek.gruchal...@technicolor.com mailto:radek.gruchal...@technicolor.com | radek.gruchal...@portico.io mailto:radek.gruchal...@portico.io | _ra...@gruchalski.com _ mailto:ra...@gruchalski.com 00447889948663 / / *Confidentiality:* This communication is intended for the above-named person and may be confidential and/or legally privileged. If it has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender immediately. On Tuesday, 22 January 2013 at 18:08, Michael Kjellman wrote: http://wiki.apache.org/cassandra/ArchitectureInternals From: Radek Gruchalski radek.gruchal...@portico.io mailto:radek.gruchal...@portico.io Reply-To: user@cassandra.apache.org mailto:user@cassandra.apache.org user@cassandra.apache.org mailto:user@cassandra.apache.org Date: Tuesday, January 22, 2013 9:07 AM To: user@cassandra.apache.org mailto:user@cassandra.apache.org user@cassandra.apache.org mailto:user@cassandra.apache.org Subject: Cassandra source code explained Hi everyone, I am looking for any places where the Cassandra source code structure would be explained. Are there any articles / wiki available? Kind regards,? Radek Gruchalski radek.gruchal...@technicolor.com mailto:radek.gruchal...@technicolor.com | radek.gruchal...@portico.io mailto:radek.gruchal...@portico.io | ?_ra...@gruchalski.com? mailto:ra...@gruchalski.com_ / /*Confidentiality:* This communication is intended for the above-named person and may be confidential and/or legally privileged. If it has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender immediately. Here are two slides to get started http://www.slideshare.net/gdusbabek/getting-to-know-the-cassandra-codebase http://www.slideshare.net/gdusbabek/cassandra-codebase-2011
Re: node down = log explosion?
Replication is configured as DC1:2,DC2:2 (i.e. every node holds the entire data). I really recommend using RF 3. The error is the coordinator node protecting it's self. Basically it cannot handle the volume of local writes + the writes for HH. The number of in flight hints is greater than… private static volatile int maxHintsInProgress = 1024 * Runtime.getRuntime().availableProcessors(); You may be able to work around this by reducing the max_hint_window_in_ms in the yaml file so that hints are recorded if say the node has been down for more than 1 minute. Anyways I would say your test showed that the current cluster does not have sufficient capacity to handle the write load with one node down and HH enabled at the current level. You can either add more nodes, use nodes with more cores, adjust the HH settings, or reduce the throughput. On the subject of bug report -- I probably will -- but I'll wait a bit for perhaps the excessive logging could be handled better, please add a ticket when you have time. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 23/01/2013, at 2:12 PM, Rob Coli rc...@palominodb.com wrote: On Tue, Jan 22, 2013 at 2:57 PM, Sergey Olefir solf.li...@gmail.com wrote: Do you have a suggestion as to what could be a better fit for counters? Something that can also replicate across DCs and survive link breakdown between nodes (across DCs)? (and no, I don't need 100.00% precision (although it would be nice obviously), I just need to be pretty close for the values of pretty) In that case, Cassandra counters are probably fine. On the subject of bug report -- I probably will -- but I'll wait a bit for more info here, perhaps there's some configuration or something that I just don't know about. Excepting on replicateOnWrite stage seems pretty unambiguous to me, and unexpected. YMMV? =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: LCS not removing rows with all TTL expired columns
It turns out that having gc_grace=0 isn't required to produce the problem. My colleague did a lot of digging into the compaction code and we think he's found the issue. It's detailed in https://issues.apache.org/jira/browse/CASSANDRA-5182 Basically tombstones for a row will not be removed from an SSTable during compaction if the row appears in other SSTables; however, the compaction code checks the bloom filters to make this determination. Since this data is rarely read we had the bloom_filter_fp_ratio set to 1.0 which makes rows seem to appear in every SSTable as far as compaction is concerned. This caused our data to essentially never be removed when using either STSC or LCS and will probably affect anyone else running 1.1 with high bloom filter fp ratios. Setting our fp ratio to 0.1, running upgradesstables and running the application as it was before seems to have stabilized the load as desired at the expense of additional jvm memory. -Bryan On Thu, Jan 17, 2013 at 6:50 PM, Bryan Talbot btal...@aeriagames.comwrote: Bleh, I rushed out the email before some meetings and I messed something up. Working on reproducing now with better notes this time. -Bryan On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams de...@fyrie.net wrote: When you ran this test, is that the exact schema you used? I'm not seeing where you are setting gc_grace to 0 (although I could just be blind, it happens). On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot btal...@aeriagames.comwrote: I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7, 1.1.8, a trivial schema, and a simple script that just inserts rows. If the TTL is small enough so that all LCS data fits in generation 0 then the rows seem to be removed with TTL expires as desired. However, if the insertion rate is high enough or the TTL long enough then the data keep accumulating for far longer than expected. Using 120 second TTL and a single threaded php insertion script my MBP with SSD retained almost all of the data. 120 seconds should accumulate 5-10 MB of data. I would expect that TTL rows to be removed eventually and for the cassandra load to level off at some reasonable value near 10 MB. After running for 2 hours and with a cassandra load of ~550 MB I stopped the test. The schema is create keyspace test with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 1} and durable_writes = true; use test; create column family test with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'TimeUUIDType' and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and bloom_filter_fp_chance = 1.0 and column_metadata = [ {column_name : 'a', validation_class : LongType}]; and the insert script is ?php require_once('phpcassa/1.0.a.5/autoload.php'); use phpcassa\Connection\ConnectionPool; use phpcassa\ColumnFamily; use phpcassa\SystemManager; use phpcassa\UUID; // Connect to test keyspace and column family $sys = new SystemManager('127.0.0.1'); // Start a connection pool, create our ColumnFamily instance $pool = new ConnectionPool('test', array('127.0.0.1')); $testCf = new ColumnFamily($pool, 'test'); // Insert records while( 1 ) { $testCf-insert(UUID::uuid1(), array(a = 1), null, 120); } // Close our connections $pool-close(); $sys-close(); ? -Bryan On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot btal...@aeriagames.comwrote: We are using LCS and the particular row I've referenced has been involved in several compactions after all columns have TTL expired. The most recent one was again this morning and the row is still there -- TTL expired for several days now with gc_grace=0 and several compactions later ... $ ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db $ ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db $ ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 %x') { 34353966623436302d356163652d313165322d396239322d313164363762363136336234: [[app_name,50f21d3d,1357785277207001,d], [client_ip,50f21d3d,1357785277207001,d], [client_req_id,50f21d3d,1357785277207001,d], [mysql_call_cnt,50f21d3d,1357785277207001,d], [mysql_duration_us,50f21d3d,1357785277207001,d], [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], [mysql_success_call_cnt,50f21d3d,1357785277207001,d],
Re: LCS not removing rows with all TTL expired columns
Thanks for letting us know. I also have a some tables with a lot of activity and very short ttls, and while I haven't experienced this problem, it's good to know just in case. On Tue, Jan 22, 2013 at 7:35 PM, Bryan Talbot btal...@aeriagames.comwrote: It turns out that having gc_grace=0 isn't required to produce the problem. My colleague did a lot of digging into the compaction code and we think he's found the issue. It's detailed in https://issues.apache.org/jira/browse/CASSANDRA-5182 Basically tombstones for a row will not be removed from an SSTable during compaction if the row appears in other SSTables; however, the compaction code checks the bloom filters to make this determination. Since this data is rarely read we had the bloom_filter_fp_ratio set to 1.0 which makes rows seem to appear in every SSTable as far as compaction is concerned. This caused our data to essentially never be removed when using either STSC or LCS and will probably affect anyone else running 1.1 with high bloom filter fp ratios. Setting our fp ratio to 0.1, running upgradesstables and running the application as it was before seems to have stabilized the load as desired at the expense of additional jvm memory. -Bryan On Thu, Jan 17, 2013 at 6:50 PM, Bryan Talbot btal...@aeriagames.comwrote: Bleh, I rushed out the email before some meetings and I messed something up. Working on reproducing now with better notes this time. -Bryan On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams de...@fyrie.net wrote: When you ran this test, is that the exact schema you used? I'm not seeing where you are setting gc_grace to 0 (although I could just be blind, it happens). On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot btal...@aeriagames.comwrote: I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7, 1.1.8, a trivial schema, and a simple script that just inserts rows. If the TTL is small enough so that all LCS data fits in generation 0 then the rows seem to be removed with TTL expires as desired. However, if the insertion rate is high enough or the TTL long enough then the data keep accumulating for far longer than expected. Using 120 second TTL and a single threaded php insertion script my MBP with SSD retained almost all of the data. 120 seconds should accumulate 5-10 MB of data. I would expect that TTL rows to be removed eventually and for the cassandra load to level off at some reasonable value near 10 MB. After running for 2 hours and with a cassandra load of ~550 MB I stopped the test. The schema is create keyspace test with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 1} and durable_writes = true; use test; create column family test with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'TimeUUIDType' and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and bloom_filter_fp_chance = 1.0 and column_metadata = [ {column_name : 'a', validation_class : LongType}]; and the insert script is ?php require_once('phpcassa/1.0.a.5/autoload.php'); use phpcassa\Connection\ConnectionPool; use phpcassa\ColumnFamily; use phpcassa\SystemManager; use phpcassa\UUID; // Connect to test keyspace and column family $sys = new SystemManager('127.0.0.1'); // Start a connection pool, create our ColumnFamily instance $pool = new ConnectionPool('test', array('127.0.0.1')); $testCf = new ColumnFamily($pool, 'test'); // Insert records while( 1 ) { $testCf-insert(UUID::uuid1(), array(a = 1), null, 120); } // Close our connections $pool-close(); $sys-close(); ? -Bryan On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot btal...@aeriagames.comwrote: We are using LCS and the particular row I've referenced has been involved in several compactions after all columns have TTL expired. The most recent one was again this morning and the row is still there -- TTL expired for several days now with gc_grace=0 and several compactions later ... $ ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db $ ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db $ ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 %x') { 34353966623436302d356163652d313165322d396239322d313164363762363136336234: [[app_name,50f21d3d,1357785277207001,d], [client_ip,50f21d3d,1357785277207001,d],
Re: node down = log explosion?
Thanks! Node writing to log because it cannot handle load is much different than node writing to log just because. Although the amount of logging is still excessive and would it really hurt anything to add something like can't handle load to the exception message? On the subject of RF:3 -- could you please elaborate? - Why RF:3 is important? (vs e.g. 2) - My total replication factor is 4 over two DCs -- I suppose you mean 3 replicas in each DC? - Does that mean I'll have to run at least 4 nodes in each DC? (3 for RF:3 and one additional in case one fails) (and again -- thanks Aaron! You've been helping me A LOT on this list.) Best regards, Sergey aaron morton wrote Replication is configured as DC1:2,DC2:2 (i.e. every node holds the entire data). I really recommend using RF 3. The error is the coordinator node protecting it's self. Basically it cannot handle the volume of local writes + the writes for HH. The number of in flight hints is greater than… private static volatile int maxHintsInProgress = 1024 * Runtime.getRuntime().availableProcessors(); You may be able to work around this by reducing the max_hint_window_in_ms in the yaml file so that hints are recorded if say the node has been down for more than 1 minute. Anyways I would say your test showed that the current cluster does not have sufficient capacity to handle the write load with one node down and HH enabled at the current level. You can either add more nodes, use nodes with more cores, adjust the HH settings, or reduce the throughput. On the subject of bug report -- I probably will -- but I'll wait a bit for perhaps the excessive logging could be handled better, please add a ticket when you have time. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 23/01/2013, at 2:12 PM, Rob Coli lt; rcoli@ gt; wrote: On Tue, Jan 22, 2013 at 2:57 PM, Sergey Olefir lt; solf.lists@ gt; wrote: Do you have a suggestion as to what could be a better fit for counters? Something that can also replicate across DCs and survive link breakdown between nodes (across DCs)? (and no, I don't need 100.00% precision (although it would be nice obviously), I just need to be pretty close for the values of pretty) In that case, Cassandra counters are probably fine. On the subject of bug report -- I probably will -- but I'll wait a bit for more info here, perhaps there's some configuration or something that I just don't know about. Excepting on replicateOnWrite stage seems pretty unambiguous to me, and unexpected. YMMV? =Rob -- =Robert Coli AIMGTALK - rcoli@ YAHOO - rcoli.palominob SKYPE - rcoli_palominodb -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/node-down-log-explosion-tp7584932p7584960.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.