Creating a keyspace fails

2013-01-22 Thread Paul van Hoven
I just started with cassandra. Currently I'm reading the following
tutorial about cal:
http://www.datastax.com/docs/1.1/dml/using_cql#use-cql

But I already fail when trying to create a keyspace:


$ ./cqlsh --cql3
Connected to Test Cluster at localhost:9160.
[cqlsh 2.3.0 | Cassandra 1.2.0 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Use HELP for help.
cqlsh CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy'
AND strategy_options:replication_factor='1';
Bad Request: line 1:82 mismatched input ':' expecting '='
Perhaps you meant to use CQL 2? Try using the -2 option when starting cqlsh.


What is wrong?


Re: Creating a keyspace fails

2013-01-22 Thread Jason Wee
cqlsh CREATE KEYSPACE demodb WITH replication = {'class':
'SimpleStrategy', 'replication_factor': 3};
cqlsh use demodb;
cqlsh:demodb


On Tue, Jan 22, 2013 at 7:04 PM, Paul van Hoven 
paul.van.ho...@googlemail.com wrote:

 CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy'
 AND strategy_options:replication_factor='1';



Re: How to store large columns?

2013-01-22 Thread Sávio Teles
But, this keys have the same prefix. So, they will be distributed on the
same node. Right?

2013/1/21 Jason Brown jasbr...@netflix.com

  The reason for multiple keys (and, by extension, multiple columns) is to
 better distribute the write/read load across the cluster as keys will
 (hopefully) be distributed on different nodes. This helps to avoid hot
 spots.

  Hope this helps,

  -Jason Brown
 Netflix
  --
 *From:* Sávio Teles [savio.te...@lupa.inf.ufg.br]
 *Sent:* Monday, January 21, 2013 9:51 AM
 *To:* user@cassandra.apache.org

 *Subject:* Re: How to store large columns?

  Astyanax split large objects into multiple keys. Is it a good idea? It is 
 better
 to split into multiple columns?

 Thanks

 2013/1/21 Sávio Teles savio.te...@lupa.inf.ufg.br


 Thanks Keith Wright.


 2013/1/21 Keith Wright kwri...@nanigans.com

  This may be helpful:
 https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store

   From: Vegard Berget p...@fantasista.no
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org,
 Vegard Berget p...@fantasista.no
 Date: Monday, January 21, 2013 8:35 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: How to store large columns?



 Hi,

 You could split it into multiple columns on the client side:
 RowKeyData: Part1: [1mb], Part2: [1mb], Part3: [1mb]...PartN[1mb]

 Now you can use multiple get() in parallell to get the files back and
 then join them back to one file.

 I _think_ maybe the new CQL3-protocol does not have the same limitation,
 but I have never tried large columns there, so someone with more experience
 than me will have to confirm this.

 .vegard,


 - Original Message -
  From:
 user@cassandra.apache.org

 To:
 user@cassandra.apache.org
 Cc:

 Sent:
 Mon, 21 Jan 2013 11:16:40 -0200
 Subject:
 How to store large columns?


 We wish to store a column in a row with size larger 
 thanthrift_framed_transport_size_in_mb
 . But, Thrift has a maximum frame size configured by
 thrift_framed_transport_size_in_mb in cassandra.yaml.
 so, How to store columns with size larger than
 thrift_framed_transport_size_in_mb? Increasing this value does not solve the
 problem, since we have columns with varying sizes.

 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
  voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
  Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG




 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
  Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG




 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
  Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG




-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG


Re: How to store large columns?

2013-01-22 Thread Vegard Berget
Hi,
No, the keys are hashed to be distributed, at least if you use
RandomPartitioner.From 
http://www.datastax.com/docs/1.0/cluster_architecture/partitioning:To
distribute the data evenly across the number of nodes, a hashing
algorithm creates an MD5 hash value of the row key
.vegard,
   

- Original Message -
From: user@cassandra.apache.org
To:
Cc:
Sent:Tue, 22 Jan 2013 09:40:19 -0200
Subject:Re: How to store large columns?

But, this keys have the same prefix. So, they will be distributed on
the same node Right? 

 2013/1/21 Jason Brown 
  The reason for multiple keys (and, by extension, multiple columns)
is to better distribute the write/read load across the cluster as keys
will (hopefully) be distributed on different nodes. This helps to
avoid hot spots. 
 Hope this helps, 
 -Jason Brown Netflix

-
 FROM: Sávio Teles [savio.te...@lupa.inf.ufg.br [2]]
SENT: Monday, January 21, 2013 9:51 AM
TO: user@cassandra.apache.org [3] 
SUBJECT: Re: How to store large columns?

   Astyanax split large objects into multiple keys. Is it a good idea?
 It is better to split  into multiple columns?

 Thanks

2013/1/21 Sávio Teles 

 Thanks Keith Wright.   
  
2013/1/21 Keith Wright 
  This may be helpful:
 https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store [6]  
   From: Vegard Berget 
Reply-To: user@cassandra.apache.org [8] , Vegard Berget 
Date: Monday, January 21, 2013 8:35 AM
To: user@cassandra.apache.org [11] 
Subject: Re: How to store large columns?

    

Hi, 

You could split it into multiple columns on the client side:  
 RowKeyData: Part1: [1mb], Part2: [1mb], Part3: [1mb]...PartN[1mb] 

Now you can use multiple get() in parallell to get the files back and
then join them back to one file. 

I _think_ maybe the new CQL3-protocol does not have the same
limitation, but I have never tried large columns there, so someone
with more experience than me will have to confirm this. 

.vegard,  
 - Original Message -
 From: user@cassandra.apacheorg [13]  
To: 
Cc: 
Sent: Mon, 21 Jan 2013 11:16:40 -0200
Subject: How to store large columns?

We wish to store a column in a row with size larger than
thrift_framed_transport_size_in_mb. But, Thrift has a maximum frame
size configured by thrift_framed_transport_size_in_mb in
cassandra.yaml. 
 so, How to store columns with size larger than
thrift_framed_transport_size_in_mb? Increasing this value  does not
solve the problem, since we have columns with varying sizes.

 -- 
Atenciosamente,
 Sávio S. Teles de Oliveira
 voice:  +55 62 9136 6996
http://br.linkedin.com/in/savioteles [15]
Mestrando em Ciências da Computação - UFG 
 Arquiteto de Software
  Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG   
   

 -- 
Atenciosamente,
 Sávio S. Teles de Oliveira
voice:  +55 62 9136 6996
http://br.linkedin.com/in/savioteles [16]
Mestrando em Ciências da Computação - UFG 
 Arquiteto de Software
  Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG   
   

 -- 
Atenciosamente,
 Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles [17]
Mestrando em Ciências da Computação - UFG 
 Arquiteto de Software
  Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG   
   

-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles [18]
 Mestrando em Ciências da Computação - UFG 
Arquiteto de Software
 Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG  

Links:
--
[1] mailto:jasbr...@netflix.com
[2] mailto:savio.te...@lupa.inf.ufg.br
[3] mailto:user@cassandra.apache.org
[4] mailto:savio.te...@lupa.inf.ufg.br
[5] mailto:kwri...@nanigans.com
[6] https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store
[7] mailto:p...@fantasista.no
[8] mailto:user@cassandra.apache.org
[9] mailto:user@cassandra.apache.org
[10] mailto:p...@fantasista.no
[11] mailto:user@cassandra.apache.org
[12] mailto:user@cassandra.apache.org
[13] mailto:user@cassandra.apache.org
[14] mailto:user@cassandra.apache.org
[15] http://br.linkedin.com/in/savioteles
[16] http://br.linkedin.com/in/savioteles
[17] http://br.linkedin.com/in/savioteles
[18] http://br.linkedin.com/in/savioteles



Re: Creating a keyspace fails

2013-01-22 Thread Paul van Hoven
Okay, that worked. Why is the statement from the tutorial wrong. I
mean, why would a company like datastax post somthing like this?

2013/1/22 Jason Wee peich...@gmail.com:
 cqlsh CREATE KEYSPACE demodb WITH replication = {'class': 'SimpleStrategy',
 'replication_factor': 3};
 cqlsh use demodb;
 cqlsh:demodb


 On Tue, Jan 22, 2013 at 7:04 PM, Paul van Hoven
 paul.van.ho...@googlemail.com wrote:

 CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy'
 AND strategy_options:replication_factor='1';





Re: How to store large columns?

2013-01-22 Thread Sávio Teles
You're right Vegard! Thanks

2013/1/22 Vegard Berget p...@fantasista.no

 Hi,

 No, the keys are hashed to be distributed, at least if you use
 RandomPartitioner.
 From http://www.datastax.com/docs/1.0/cluster_architecture/partitioning:
 To distribute the data evenly across the number of nodes, a hashing
 algorithm creates an MD5 hash value of the row key

 .vegard,





 - Original Message -
 From:
 user@cassandra.apache.org

 To:
 user@cassandra.apache.org
 Cc:

 Sent:
 Tue, 22 Jan 2013 09:40:19 -0200

 Subject:
 Re: How to store large columns?


 But, this keys have the same prefix. So, they will be distributed on the
 same node. Right?

 2013/1/21 Jason Brown jasbr...@netflix.com

  The reason for multiple keys (and, by extension, multiple columns) is
 to better distribute the write/read load across the cluster as keys will
 (hopefully) be distributed on different nodes. This helps to avoid hot
 spots.

 Hope this helps,

 -Jason Brown
 Netflix
 --
 *From:* Sávio Teles [savio.te...@lupa.inf.ufg.brsavio.te...@lupa.inf.ufgbr
 ]
 *Sent:* Monday, January 21, 2013 9:51 AM
 *To:* user@cassandra.apache.org

 *Subject:* Re: How to store large columns?

  Astyanax split large objects into multiple keys. Is it a good idea? It
 is better to split into multiple columns?

 Thanks

 2013/1/21 Sávio Teles savio.te...@lupa.inf.ufg.br


 Thanks Keith Wright.


 2013/1/21 Keith Wright kwri...@nanigans.com

  This may be helpful:
 https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store

  From: Vegard Berget p...@fantasista.no
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org,
 Vegard Berget p...@fantasista.no
 Date: Monday, January 21, 2013 8:35 AM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: How to store large columns?



 Hi,

 You could split it into multiple columns on the client side:
 RowKeyData: Part1: [1mb], Part2: [1mb], Part3: [1mb]...PartN[1mb]

 Now you can use multiple get() in parallell to get the files back and
 then join them back to one file.

 I _think_ maybe the new CQL3-protocol does not have the same
 limitation, but I have never tried large columns there, so someone with
 more experience than me will have to confirm this.

 .vegard,


 - Original Message -
 From:
 user@cassandra.apache.org

 To:
 user@cassandra.apache.org
 Cc:

 Sent:
 Mon, 21 Jan 2013 11:16:40 -0200
 Subject:
 How to store large columns?


 We wish to store a column in a row with size larger 
 thanthrift_framed_transport_size_in_mb
 . But, Thrift has a maximum frame size configured by
 thrift_framed_transport_size_in_mb in cassandra.yaml.
 so, How to store columns with size larger than
 thrift_framed_transport_size_in_mb? Increasing this value does not
 solve the problem, since we have columns with varying sizes.

 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
  Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG




 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
  Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG




 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
  Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG




 --
 Atenciosamente,
 Sávio S. Teles de Oliveira
 voice: +55 62 9136 6996
 http://br.linkedin.com/in/savioteles
 Mestrando em Ciências da Computação - UFG
 Arquiteto de Software
 Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG




-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG


Re: Creating a keyspace fails

2013-01-22 Thread Jason Wee
maybe typo or forget to update the doc... but anyway, you can use the help
command when you are in cqlsh.. for example:

cqlsh HELP CREATE_KEYSPACE;

CREATE KEYSPACE ksname
WITH replication = {'class':'strategy' [,'option':val]};



On Tue, Jan 22, 2013 at 8:06 PM, Paul van Hoven 
paul.van.ho...@googlemail.com wrote:

 Okay, that worked. Why is the statement from the tutorial wrong. I
 mean, why would a company like datastax post somthing like this?

 2013/1/22 Jason Wee peich...@gmail.com:
  cqlsh CREATE KEYSPACE demodb WITH replication = {'class':
 'SimpleStrategy',
  'replication_factor': 3};
  cqlsh use demodb;
  cqlsh:demodb
 
 
  On Tue, Jan 22, 2013 at 7:04 PM, Paul van Hoven
  paul.van.ho...@googlemail.com wrote:
 
  CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy'
  AND strategy_options:replication_factor='1';
 
 
 



node down = log explosion?

2013-01-22 Thread Sergey Olefir
I have Cassandra 1.1.7 cluster with 4 nodes in 2 datacenters (2+2).
Replication is configured as DC1:2,DC2:2 (i.e. every node holds the entire
data).

I am load-testing counter increments at the rate of about 10k per second.
All writes are directed to two nodes in DC1 (DC2 nodes are basically
backup). In total there's 100 separate clients executing 1-2 batch updates
per second.

We wanted to test what happens if one node goes down, so we brought one node
down in DC1 (i.e. the node that was handling half of the incoming writes).

This led to a complete explosion of logs on the remaining alive node in DC1.

There are hundreds of megabytes of logs within an hour all basically saying
the same thing:
ERROR [ReplicateOnWriteStage:5653390] 2013-01-22 12:44:33,611
AbstractCassandraDaemon.java (line 135) Exception in thread
Thread[ReplicateOnWriteStage:5653390,5,main]
java.lang.RuntimeException: java.util.concurrent.TimeoutException
at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1275)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.util.concurrent.TimeoutException
at
org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:311)
at
org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:585)
at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1271)
... 3 more


The logs are completely swamped with this and are thus unusable. Of course
logs should report errors, but we don't need hundred of megabytes of this :)
Is there anything that can be done to reduce the amount of this spam? In
addition to making logs unusable I strongly suspect this spam makes server
unable to accept as many increments as it otherwise could.




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/node-down-log-explosion-tp7584932.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Creating a keyspace fails

2013-01-22 Thread Paul van Hoven
Alright. Thanks for you quick help. :)

2013/1/22 Jason Wee peich...@gmail.com:
 maybe typo or forget to update the doc... but anyway, you can use the help
 command when you are in cqlsh.. for example:

 cqlsh HELP CREATE_KEYSPACE;

 CREATE KEYSPACE ksname
 WITH replication = {'class':'strategy' [,'option':val]};



 On Tue, Jan 22, 2013 at 8:06 PM, Paul van Hoven
 paul.van.ho...@googlemail.com wrote:

 Okay, that worked. Why is the statement from the tutorial wrong. I
 mean, why would a company like datastax post somthing like this?

 2013/1/22 Jason Wee peich...@gmail.com:
  cqlsh CREATE KEYSPACE demodb WITH replication = {'class':
  'SimpleStrategy',
  'replication_factor': 3};
  cqlsh use demodb;
  cqlsh:demodb
 
 
  On Tue, Jan 22, 2013 at 7:04 PM, Paul van Hoven
  paul.van.ho...@googlemail.com wrote:
 
  CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy'
  AND strategy_options:replication_factor='1';
 
 
 




Is this how to read the output of nodetool cfhistograms?

2013-01-22 Thread Brian Tarbox
The output of this command seems to make no sense unless I think of it as 5
completely separate histograms that just happen to be displayed together.

Using this example output should I read it as: my reads all took either 1
or 2 sstable.  And separately, I had write latencies of 3,7,19.  And
separately I had read latencies of 2, 8,69, etc?

In other words...each row isn't really a row...i.e. on those 16033 reads
from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row
size and 0 column count.  Is that right?

Offset  SSTables Write Latency  Read Latency  Row Size
 Column Count
1  16033 00
   0 0
2303   00
 0 1
3  0 00
   0 0
4  0 00
   0 0
5  0 00
   0 0
6  0 00
   0 0
7  0 00
   0 0
8  0 02
   0 0
10 0 00
   0  6261
12 0 02
   0   117
14 0 08
   0 0
17 0 3   69
   0   255
20 0 7  163
   0 0
24 019 1369
   0 0


Re: Is this how to read the output of nodetool cfhistograms?

2013-01-22 Thread Mina Naguib


On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com wrote:

 The output of this command seems to make no sense unless I think of it as 5 
 completely separate histograms that just happen to be displayed together.
 
 Using this example output should I read it as: my reads all took either 1 or 
 2 sstable.  And separately, I had write latencies of 3,7,19.  And separately 
 I had read latencies of 2, 8,69, etc?
 
 In other words...each row isn't really a row...i.e. on those 16033 reads from 
 a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size 
 and 0 column count.  Is that right?

Correct.  A number in any of the metric columns is a count value bucketed in 
the offset on that row.  There are no relationships between other columns on 
the same row.

So your first row says 16033 reads were satisfied by 1 sstable.  The other 
metrics (for example, latency of these reads) is reflected in the histogram 
under Read Latency, under various other bucketed offsets.

 
 Offset  SSTables Write Latency  Read Latency  Row Size
   Column Count
 1  16033 00   
  0 0
 2303   00 
0 1
 3  0 00   
  0 0
 4  0 00   
  0 0
 5  0 00   
  0 0
 6  0 00   
  0 0
 7  0 00   
  0 0
 8  0 02   
  0 0
 10 0 00   
  0  6261
 12 0 02   
  0   117
 14 0 08   
  0 0
 17 0 3   69   
  0   255
 20 0 7  163   
  0 0
 24 019 1369   
  0 0
 



Re: Is this how to read the output of nodetool cfhistograms?

2013-01-22 Thread Brian Tarbox
Thank you!   Since this is a very non-standard way to display data it might
be worth a better explanation in the various online documentation sets.

Thank you again.

Brian


On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib mina.nag...@adgear.com wrote:



 On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com wrote:

  The output of this command seems to make no sense unless I think of it
 as 5 completely separate histograms that just happen to be displayed
 together.
 
  Using this example output should I read it as: my reads all took either
 1 or 2 sstable.  And separately, I had write latencies of 3,7,19.  And
 separately I had read latencies of 2, 8,69, etc?
 
  In other words...each row isn't really a row...i.e. on those 16033 reads
 from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row
 size and 0 column count.  Is that right?

 Correct.  A number in any of the metric columns is a count value bucketed
 in the offset on that row.  There are no relationships between other
 columns on the same row.

 So your first row says 16033 reads were satisfied by 1 sstable.  The
 other metrics (for example, latency of these reads) is reflected in the
 histogram under Read Latency, under various other bucketed offsets.

 
  Offset  SSTables Write Latency  Read Latency  Row
 Size  Column Count
  1  16033 00
0 0
  2303   00
  0 1
  3  0 00
0 0
  4  0 00
0 0
  5  0 00
0 0
  6  0 00
0 0
  7  0 00
0 0
  8  0 02
0 0
  10 0 00
0  6261
  12 0 02
0   117
  14 0 08
0 0
  17 0 3   69
0   255
  20 0 7  163
0 0
  24 019 1369
0 0
 




Re: Is this how to read the output of nodetool cfhistograms?

2013-01-22 Thread Edward Capriolo
This was described in good detail here:

http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox tar...@cabotresearch.comwrote:

 Thank you!   Since this is a very non-standard way to display data it
 might be worth a better explanation in the various online documentation
 sets.

 Thank you again.

 Brian


 On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib mina.nag...@adgear.comwrote:



 On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com wrote:

  The output of this command seems to make no sense unless I think of it
 as 5 completely separate histograms that just happen to be displayed
 together.
 
  Using this example output should I read it as: my reads all took either
 1 or 2 sstable.  And separately, I had write latencies of 3,7,19.  And
 separately I had read latencies of 2, 8,69, etc?
 
  In other words...each row isn't really a row...i.e. on those 16033
 reads from a single SSTable I didn't have 0 write latency, 0 read latency,
 0 row size and 0 column count.  Is that right?

 Correct.  A number in any of the metric columns is a count value bucketed
 in the offset on that row.  There are no relationships between other
 columns on the same row.

 So your first row says 16033 reads were satisfied by 1 sstable.  The
 other metrics (for example, latency of these reads) is reflected in the
 histogram under Read Latency, under various other bucketed offsets.

 
  Offset  SSTables Write Latency  Read Latency  Row
 Size  Column Count
  1  16033 00
0 0
  2303   00
  0 1
  3  0 00
0 0
  4  0 00
0 0
  5  0 00
0 0
  6  0 00
0 0
  7  0 00
0 0
  8  0 02
0 0
  10 0 00
0  6261
  12 0 02
0   117
  14 0 08
0 0
  17 0 3   69
0   255
  20 0 7  163
0 0
  24 019 1369
0 0
 





Re: Is this how to read the output of nodetool cfhistograms?

2013-01-22 Thread Brian Tarbox
Indeed, but how many Cassandra users have the good fortune to stumble
across that page?  Just saying that the explanation of the very powerful
nodetool commands should be more front and center.

Brian


On Tue, Jan 22, 2013 at 10:03 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 This was described in good detail here:

 http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

 On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox tar...@cabotresearch.comwrote:

 Thank you!   Since this is a very non-standard way to display data it
 might be worth a better explanation in the various online documentation
 sets.

 Thank you again.

 Brian


 On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib mina.nag...@adgear.comwrote:



 On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com
 wrote:

  The output of this command seems to make no sense unless I think of it
 as 5 completely separate histograms that just happen to be displayed
 together.
 
  Using this example output should I read it as: my reads all took
 either 1 or 2 sstable.  And separately, I had write latencies of 3,7,19.
  And separately I had read latencies of 2, 8,69, etc?
 
  In other words...each row isn't really a row...i.e. on those 16033
 reads from a single SSTable I didn't have 0 write latency, 0 read latency,
 0 row size and 0 column count.  Is that right?

 Correct.  A number in any of the metric columns is a count value
 bucketed in the offset on that row.  There are no relationships between
 other columns on the same row.

 So your first row says 16033 reads were satisfied by 1 sstable.  The
 other metrics (for example, latency of these reads) is reflected in the
 histogram under Read Latency, under various other bucketed offsets.

 
  Offset  SSTables Write Latency  Read Latency  Row
 Size  Column Count
  1  16033 00
  0 0
  2303   00
0 1
  3  0 00
  0 0
  4  0 00
  0 0
  5  0 00
  0 0
  6  0 00
  0 0
  7  0 00
  0 0
  8  0 02
  0 0
  10 0 00
  0  6261
  12 0 02
  0   117
  14 0 08
  0 0
  17 0 3   69
  0   255
  20 0 7  163
  0 0
  24 019 1369
  0 0
 






Re: Creating a keyspace fails

2013-01-22 Thread Colin Blower
You were most likely looking at the wrong documentation. The syntax for 
CQL3 changed between Cassandra 1.1 and 1.2. When I google cassandra 
CQL3 the first result is Cassandra 1.1 documentation about CQL3, which 
is wrong for 1.2.


Make sure you are looking at the documentation for the version you are 
using. It might also be nice for DataStax to update the 1.1 
documentation with a warning.


--
*Colin Blower*


On 01/22/2013 04:06 AM, Paul van Hoven wrote:

Okay, that worked. Why is the statement from the tutorial wrong. I
mean, why would a company like datastax post somthing like this?

2013/1/22 Jason Wee peich...@gmail.com:

cqlsh CREATE KEYSPACE demodb WITH replication = {'class': 'SimpleStrategy',
'replication_factor': 3};
cqlsh use demodb;
cqlsh:demodb


On Tue, Jan 22, 2013 at 7:04 PM, Paul van Hoven
paul.van.ho...@googlemail.com wrote:

CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy'
AND strategy_options:replication_factor='1';








Cassandra source code explained

2013-01-22 Thread Radek Gruchalski
Hi everyone,  

I am looking for any places where the Cassandra source code structure would be 
explained.
Are there any articles / wiki available?

Kind regards,

Radek Gruchalski
radek.gruchal...@technicolor.com (mailto:radek.gruchal...@technicolor.com) | 
radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) | 

ra...@gruchalski.com
 (mailto:ra...@gruchalski.com)



Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.



Re: Cassandra source code explained

2013-01-22 Thread Michael Kjellman
http://wiki.apache.org/cassandra/ArchitectureInternals

From: Radek Gruchalski 
radek.gruchal...@portico.iomailto:radek.gruchal...@portico.io
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, January 22, 2013 9:07 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Cassandra source code explained

Hi everyone,

I am looking for any places where the Cassandra source code structure would be 
explained.
Are there any articles / wiki available?

Kind regards,?
Radek Gruchalski
radek.gruchal...@technicolor.commailto:radek.gruchal...@technicolor.com | 
radek.gruchal...@portico.iomailto:radek.gruchal...@portico.io | 
?ra...@gruchalski.com?mailto:ra...@gruchalski.com

Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.



Re: Cassandra source code explained

2013-01-22 Thread Radek Gruchalski
Thank you. I found this but was hoping that there's anything broader out there. 
 
This will have to be enough.

Kind regards,

Radek Gruchalski
radek.gruchal...@technicolor.com (mailto:radek.gruchal...@technicolor.com) | 
radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) | 

ra...@gruchalski.com
 (mailto:ra...@gruchalski.com)
00447889948663





Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.


On Tuesday, 22 January 2013 at 18:08, Michael Kjellman wrote:

 http://wiki.apache.org/cassandra/ArchitectureInternals
  
 From: Radek Gruchalski radek.gruchal...@portico.io 
 (mailto:radek.gruchal...@portico.io)
 Reply-To: user@cassandra.apache.org (mailto:user@cassandra.apache.org) 
 user@cassandra.apache.org (mailto:user@cassandra.apache.org)
 Date: Tuesday, January 22, 2013 9:07 AM
 To: user@cassandra.apache.org (mailto:user@cassandra.apache.org) 
 user@cassandra.apache.org (mailto:user@cassandra.apache.org)
 Subject: Cassandra source code explained
  
 Hi everyone,  
  
 I am looking for any places where the Cassandra source code structure would 
 be explained.
 Are there any articles / wiki available?
  
 Kind regards,?
 Radek Gruchalski
 radek.gruchal...@technicolor.com (mailto:radek.gruchal...@technicolor.com) | 
 radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) | 
 ?ra...@gruchalski.com? (mailto:ra...@gruchalski.com)
  
 Confidentiality:
 This communication is intended for the above-named person and may be 
 confidential and/or legally privileged.
 If it has come to you in error you must take no action based on it, nor must 
 you copy or show it to anyone; please delete/destroy and inform the sender 
 immediately.
  



Re: Is this how to read the output of nodetool cfhistograms?

2013-01-22 Thread Wei Zhu
I agree that Cassandra cfhistograms is probably the most bizarre metrics I have 
ever come across although it's extremely useful. 

I believe the offset is actually the metrics it has tracked (x-axis on the 
traditional histogram) and the number under each column is how many times that 
value has been recorded (y-axis on the traditional histogram). Your write 
latency are 17, 20, 24 (microseconds?). 3 writes took 17, 7 writes took 20 and 
19 writes took 24

Correct me if I am wrong.

Thanks.
-Wei



 From: Brian Tarbox tar...@cabotresearch.com
To: user@cassandra.apache.org 
Sent: Tuesday, January 22, 2013 7:27 AM
Subject: Re: Is this how to read the output of nodetool cfhistograms?
 

Indeed, but how many Cassandra users have the good fortune to stumble across 
that page?  Just saying that the explanation of the very powerful nodetool 
commands should be more front and center.

Brian



On Tue, Jan 22, 2013 at 10:03 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

This was described in good detail here:



http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/


On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox tar...@cabotresearch.com wrote:

Thank you!   Since this is a very non-standard way to display data it might be 
worth a better explanation in the various online documentation sets.


Thank you again.


Brian



On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib mina.nag...@adgear.com wrote:



On 2013-01-22, at 8:59 AM, Brian Tarbox tar...@cabotresearch.com wrote:

 The output of this command seems to make no sense unless I think of it as 
 5 completely separate histograms that just happen to be displayed together.

 Using this example output should I read it as: my reads all took either 1 
 or 2 sstable.  And separately, I had write latencies of 3,7,19.  And 
 separately I had read latencies of 2, 8,69, etc?

 In other words...each row isn't really a row...i.e. on those 16033 reads 
 from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row 
 size and 0 column count.  Is that right?

Correct.  A number in any of the metric columns is a count value bucketed in 
the offset on that row.  There are no relationships between other columns on 
the same row.

So your first row says 16033 reads were satisfied by 1 sstable.  The other 
metrics (for example, latency of these reads) is reflected in the histogram 
under Read Latency, under various other bucketed offsets.



 Offset      SSTables     Write Latency      Read Latency          Row Size 
      Column Count
 1              16033             0                            0            
                 0                 0
 2                303               0                            0          
                   0                 1
 3                  0                 0                            0        
                     0                 0
 4                  0                 0                            0        
                     0                 0
 5                  0                 0                            0        
                     0                 0
 6                  0                 0                            0        
                     0                 0
 7                  0                 0                            0        
                     0                 0
 8                  0                 0                            2        
                     0                 0
 10                 0                 0                            0        
                     0              6261
 12                 0                 0                            2        
                     0               117
 14                 0                 0                            8        
                     0                 0
 17                 0                 3                           69        
                     0               255
 20                 0                 7                          163        
                     0                 0
 24                 0                19                         1369        
                     0                 0






Re: Creating a keyspace fails

2013-01-22 Thread Tyler Hobbs
I sent a note to our docs team to add a warning/note to the docs there
about the difference between the syntax in 1.1 and 1.2.

Thanks!


On Tue, Jan 22, 2013 at 10:49 AM, Colin Blower cblo...@barracuda.comwrote:

  You were most likely looking at the wrong documentation. The syntax for
 CQL3 changed between Cassandra 1.1 and 1.2. When I google cassandra CQL3
 the first result is Cassandra 1.1 documentation about CQL3, which is wrong
 for 1.2.

 Make sure you are looking at the documentation for the version you are
 using. It might also be nice for DataStax to update the 1.1 documentation
 with a warning.

 --
  *Colin Blower*


 On 01/22/2013 04:06 AM, Paul van Hoven wrote:

 Okay, that worked. Why is the statement from the tutorial wrong. I
 mean, why would a company like datastax post somthing like this?

 2013/1/22 Jason Wee peich...@gmail.com peich...@gmail.com:

  cqlsh CREATE KEYSPACE demodb WITH replication = {'class': 'SimpleStrategy',
 'replication_factor': 3};
 cqlsh use demodb;
 cqlsh:demodb


 On Tue, Jan 22, 2013 at 7:04 PM, Paul van 
 Hovenpaul.van.ho...@googlemail.com paul.van.ho...@googlemail.com wrote:

  CREATE KEYSPACE demodb WITH strategy_class = 'SimpleStrategy'
 AND strategy_options:replication_factor='1';







-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Cassandra timeout whereas it is not much busy

2013-01-22 Thread aaron morton
 I have seen logs about that. I didn't worry much, since the GC of the jvm was 
 not under pressure. 
When cassandra logs a ParNew event from the GCInspector that is time the server 
is paused / frozen. CMS events have a very small pause, but they are taking a 
non trivial amount of CPU time. 

If you are logging a log of GC events you should look into it. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/01/2013, at 3:28 AM, Nicolas Lalevée nicolas.lale...@hibnet.org wrote:

 Le 17 janv. 2013 à 05:00, aaron morton aa...@thelastpickle.com a écrit :
 
 Check the disk utilisation using iostat -x 5
 If you are on a VM / in the cloud check for CPU steal. 
 Check the logs for messages from the GCInspector, the ParNew events are 
 times the JVM is paused. 
 
 I have seen logs about that. I didn't worry much, since the GC of the jvm was 
 not under pressure. As far as I understand, unless a CF is continuously 
 flushed, it should not be a major issue, isn't it ?
 I don't know for sure if there was a lot of flush though, since my nodes were 
 not properly monitored.
 
 Look at the times dropped messages are logged and try to correlate them with 
 other server events.
 
 I tried that with not much success. I have graphs on cacti though, so this is 
 quite hard to visualize when things happen simultaneously on several graphs.
 
 If you have a lot secondary indexes, or a lot of memtables flushing at the 
 some time you may be blocking behind the global Switch Lock. If you use 
 secondary indexes make sure the memtable_flush_queue_size is set correctly, 
 see the comments in the yaml file.
 
 I have no secondary indexes.
 
 If you have a lot of CF's flushing at the same time, and there are not 
 messages from the MeteredFlusher, it may be the log segment is too big for 
 the number of CF's you have. When the segment needs to be recycled all dirty 
 CF's are flushed, if you have a lot of cf's this can result in blocking 
 around the switch lock. Trying reducing the commitlog_segment_size_in_mb so 
 that less CF's are flushed.
 
 What is a lot ? We have 26 CF. 9 are barely used. 15 contains time series 
 data (cassandra rocks with them) in which only 3 of them have from 1 to 10 
 read or writes per sec. 1 quite hot (200read/s) which is mainly used for its 
 bloom filter (which disksize is about 1G). And 1 also hot used only for 
 writes (which has the same big bloom filter, which I am about to remove since 
 it is useless).
 
 BTW, thanks for the pointers. I have not tried yet to put our nodes under 
 pressure. But when I'll do, I'll look at those pointers closely.
 
 Nicolas
 
 
 Hope that helps
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 17/01/2013, at 10:30 AM, Nicolas Lalevée nicolas.lale...@hibnet.org 
 wrote:
 
 Hi,
 
 I have a strange behavior I am not able to understand.
 
 I have 6 nodes with cassandra-1.0.12. Each nodes have 8G of RAM. I have a 
 replication factor of 3.
 
 ---
 my story is maybe too long, trying shorter here, while saving what I wrote 
 in case someone has patience to read my bad english ;)
 
 I got under a situation where my cluster was generating a lot of timeouts 
 on our frontend, whereas I could not see any major trouble on the internal 
 stats. Actually cpu, read  write counts on the column families were quite 
 low. A mess until I switched from java7 to java6 and forced the used of 
 jamm. After the switch, cpu, read  write counts, were going up again, 
 timeouts gone. I have seen this behavior while reducing the xmx too.
 
 What could be blocking cassandra from utilizing the while resources of the 
 machine ? Is there is metrics I didn't saw which could explain this ?
 
 ---
 Here is the long story.
 
 When I first set my cluster up, I gave blindly 6G of heap to the cassandra 
 nodes, thinking that more a java process has, the smoother it runs, while 
 keeping some RAM to the disk cache. We got some new feature deployed, and 
 things were going into hell, some machine up to 60% of wa. I give credit to 
 cassandra because there was not that much timeout received on the web 
 frontend, it was kind of slow but is was kind of working. With some 
 optimizations, we reduced the pressure of the new feature, but it was still 
 at 40%wa.
 
 At that time I didn't have much monitoring, just heap and cpu. I read some 
 article how to tune, and I learned that the disk cache is quite important 
 because cassandra relies on it to be the read cache. So I have tried many 
 xmx, and 3G seems of kind the lowest possible. So on 2 among 6 nodes, I 
 have set 3,3G to xmx. Amazingly, I saw the wa down to 10%. Quite happy with 
 that, I changed the xmx 3,3G on each node. But then things really went to 
 hell, a lot of timeouts on the frontend. It was not working at all. So I 
 rolled back.
 
 After some time, 

Re: Concurrent write performance

2013-01-22 Thread aaron morton
Background see my talk here 
http://www.datastax.com/events/cassandrasummit2012/presentations

Mutations to a row are isolated. In practice this means that simultaneous 
writes to the same row are possible, however the first write thread to complete 
wins and the other threads start their work again. 

So if you have one very hot row you will see less throughput as the writers 
will have to do some re-work. I did a little test here 
http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance
 on slide 30. 

  sort the columns at the same time?
Don;t worry about sorting. 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/01/2013, at 4:40 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com 
wrote:

 Do you experience any performance problems?
  
 This will be the last thing to look at.
  
  
 Best regards / Pagarbiai
 Viktor Jevdokimov
 Senior Developer
 
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider
 Take a ride with Adform's Rich Media Suite
 signature-logo7f56.png
 signature-best-employer-logo42f0.png
 
 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information remains the property of the sender. You must not use, 
 disclose, distribute, copy, print or rely on this e-mail. If you have 
 received this message in error, please contact the sender immediately and 
 irrevocably delete this message and any copies.
 
 From: Jay Svc [mailto:jaytechg...@gmail.com] 
 Sent: Monday, January 21, 2013 17:28
 To: user@cassandra.apache.org
 Subject: Concurrent write performance
  
 Folks,
  
 I would like to write(insert or update) to a single row in a column family. I 
 have concurrent requests which will write to a single row. Do we see any 
 performance implications because of concurrent writes to a single row where 
 comparator has to sort the columns at the same time?
  
 Please share your thoughts.
  
 Thanks,
 Jay
 



Re: sstable2json had random behavior

2013-01-22 Thread aaron morton
William,
If the solution from Binh works for you can you please submit a ticket 
to https://issues.apache.org/jira/browse/CASSANDRA

The error message could be better if that is the case. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/01/2013, at 9:16 AM, Binh Nguyen binhn...@gmail.com wrote:

 Hi William,
 
 I also saw this one before but it always happened in my case when I have only 
 Data and Index files. The problem goes away when I have all another files 
 (Compression, Filter...)
 
 
 On Mon, Jan 21, 2013 at 11:36 AM, William Oberman ober...@civicscience.com 
 wrote:
 I'm running 1.1.6 from the datastax repo.  
 
 I ran sstable2json and got the following error:
 Exception in thread main java.io.IOError: java.io.IOException: dataSize of 
 7020023552240793698 starting at 993981393 would be larger than file 
 /var/lib/cassandra/data/X-Data.db length 7502161255
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:156)
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:86)
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:70)
 at 
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:187)
 at 
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:151)
 at 
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:143)
 at 
 org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:309)
 at 
 org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:340)
 at 
 org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:353)
 at 
 org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:418)
 Caused by: java.io.IOException: dataSize of 7020023552240793698 starting at 
 993981393 would be larger than file /var/lib/cassandra/data/X-Data.db 
 length 7502161255
 at 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:115)
 ... 9 more
 
 
 I ran it again, and didn't.  This makes me worried :-)  Does anyone else ever 
 see this class of error, and does it ever disappear for them?
 



Re: Cassandra timeout whereas it is not much busy

2013-01-22 Thread Rob Coli
On Wed, Jan 16, 2013 at 1:30 PM, Nicolas Lalevée
nicolas.lale...@hibnet.org wrote:
 Here is the long story.
 After some long useless staring at the monitoring graphs, I gave a try to
 using the openjdk 6b24 rather than openjdk 7u9

OpenJDK 6 and 7 are both counter-recommended with regards to
Cassandra. I've heard reports of mysterious behavior like the behavior
you describe, when using OpenJDK 7.

Try using the Sun/Oracle JVM? Is your JNA working?

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Cassandra pending compaction tasks keeps increasing

2013-01-22 Thread Wei Zhu
Thanks Aaron and Jim for your reply. The data import is done. We have about 
135G on each node and it's about 28K SStables. For normal operation, we only 
have about 90 writes per seconds, but when I ran nodetool compationstats, it 
remains at 9 and hardly changes. I guess it's just an estimated number.

When I ran histogram,

Offset      SSTables     Write Latency      Read Latency          Row Size      
Column Count
1               2644                 0                 0                 0      
    18660057
2               8204                 0                 0                 0      
     9824270
3              11198                 0                 0                 0      
     6968475
4               4269                 6                 0                 0      
     5510745
5                517                29                 0                 0      
     4595205


You can see about half of the reads result in 3 SSTables. Majority of read 
latency are under 5ms, only a dozen are over 10ms. We haven't fully turn on 
reads yet, only 60 reads per second. We see about 20 read timeout during the 
past 12 hours. Not a single warning from Cassandra Log. 

Is it normal for Cassandra to timeout some requests? We set rpc timeout to be 
1s, it shouldn't time out any of them?

Thanks.
-Wei



 From: aaron morton aa...@thelastpickle.com
To: user@cassandra.apache.org 
Sent: Monday, January 21, 2013 12:21 AM
Subject: Re: Cassandra pending compaction tasks keeps increasing
 

The main guarantee LCS gives you is that most reads will only touch 1 row 
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

If compaction is falling behind this may not hold.

nodetool cfhistograms tells you how many SSTables were read from for reads.  
It's a recent histogram that resets each time you read from it. 

Also, parallel levelled compaction in 1.2 
http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/01/2013, at 7:49 AM, Jim Cistaro jcist...@netflix.com wrote:

1) In addition to iostat, dstat is a good tool to see wht kind of disck 
throuput your are getting.  That would be one thing to monitor.
2) For LCS, we also see pending compactions skyrocket.  During load, LCS will 
create a lot of small sstables which will queue up for compaction.
3) For us the biggest concern is not how high the pending count gets, but how 
often it gets back down near zero.  If your load is something you can do in 
segments or pause, then you can see how fast the cluster recovers on the 
compactions.
4) One thing which we tune per cluster is the size of the files.  Increasing 
this from 5MB can sometimes improve things.  But I forget if we have ever 
changed this after starting data load.


Is your cluster receiving read traffic during this data migration? If so, I 
would say that read latency is your best measure.  If the high number of 
SSTables waiting to compact is not hurting your reads, then you are probably 
ok.  Since you are on SSD, there is a good chance the compactions are not 
hurting you.  As for compactionthroughput, we set ours high for SSD.  You 
usually wont use it all because the compactions are usually single threaded.  
Dstat will help you measure this.


I hope this helps,
jc

From: Wei Zhu wz1...@yahoo.com
Reply-To: user@cassandra.apache.org user@cassandra.apache.org, Wei Zhu 
wz1...@yahoo.com
Date: Friday, January 18, 2013 12:10 PM
To: Cassandr usergroup user@cassandra.apache.org
Subject: Cassandra pending compaction tasks keeps increasing



Hi,
When I run nodetool compactionstats


I see the number of pending tasks keep going up steadily. 


I tried to increase the  compactionthroughput, by using


nodetool setcompactionthroughput


I even tried the extreme to set it to 0 to disable the throttling. 


I checked iostats and we have SSD for data, the disk util is less than 5% 
which means it's not I/O bound, CPU is also less than 10%


We are using levelcompaction and in the process of migrating data. We have 
4500 writes per second and very few reads. We have about 70G data now and will 
grow to 150G when the migration finishes. We only have one CF and right now 
the number of  SSTable is around 15000, write latency is still under 0.1ms. 


Anything needs to be concerned? Or anything I can do to reduce the number of 
pending compaction?


Thanks.
-Wei



 

Re: sstable2json had random behavior

2013-01-22 Thread William Oberman
No, I have the other files unfortunately and I had it fail once and succeed
every time after.

I'm tracking the external information of sstable2json more carefully now
(exit status, stdout, stderr), so hopefully if it happens again I can be
more help.

will


On Tue, Jan 22, 2013 at 3:38 PM, aaron morton aa...@thelastpickle.comwrote:

 William,
 If the solution from Binh works for you can you please submit a ticket to
 https://issues.apache.org/jira/browse/CASSANDRA

 The error message could be better if that is the case.

 Cheers

-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 22/01/2013, at 9:16 AM, Binh Nguyen binhn...@gmail.com wrote:

 Hi William,

 I also saw this one before but it always happened in my case when I have
 only Data and Index files. The problem goes away when I have all another
 files (Compression, Filter...)


 On Mon, Jan 21, 2013 at 11:36 AM, William Oberman 
 ober...@civicscience.com wrote:

 I'm running 1.1.6 from the datastax repo.

 I ran sstable2json and got the following error:
 Exception in thread main java.io.IOError: java.io.IOException: dataSize
 of 7020023552240793698 starting at 993981393 would be larger than file
 /var/lib/cassandra/data/X-Data.db length 7502161255
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:156)
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:86)
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:70)
 at
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:187)
 at
 org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:151)
 at
 org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:143)
 at
 org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:309)
 at
 org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:340)
 at
 org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:353)
 at
 org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:418)
 Caused by: java.io.IOException: dataSize of 7020023552240793698 starting
 at 993981393 would be larger than file
 /var/lib/cassandra/data/X-Data.db length 7502161255
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.init(SSTableIdentityIterator.java:115)
 ... 9 more


 I ran it again, and didn't.  This makes me worried :-)  Does anyone else
 ever see this class of error, and does it ever disappear for them?






Re: node down = log explosion?

2013-01-22 Thread Rob Coli
On Tue, Jan 22, 2013 at 5:03 AM, Sergey Olefir solf.li...@gmail.com wrote:
 I am load-testing counter increments at the rate of about 10k per second.

Do you need highly performant counters that count accurately, without
meaningful chance of over-count? If so, Cassandra's counters are
probably not ideal.

 We wanted to test what happens if one node goes down, so we brought one node
 down in DC1 (i.e. the node that was handling half of the incoming writes).
 ...
 This led to a complete explosion of logs on the remaining alive node in DC1.

I agree, this level of exception logging during replicateOnWrite
(which is called every time a counter is incremented) seems like a
bug. I would file a bug at the Apache JIRA.

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Cassandra pending compaction tasks keeps increasing

2013-01-22 Thread Jim Cistaro
What version are you using?  Are you seeing any compaction related assertions 
in the logs?

Might be https://issues.apache.org/jira/browse/CASSANDRA-4411

We experienced this problem of the count only decreasing to a certain number 
and then stopping.  If you are idle, it should go to 0.  I have not seen it 
overestimate for zero, only for non-zero amounts.

As for timeouts etc, you will need to look at things like nodetool tpstats to 
see if you have pending transactions queueing up.

Jc

From: Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org, Wei Zhu 
wz1...@yahoo.commailto:wz1...@yahoo.com
Date: Tuesday, January 22, 2013 12:56 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Cassandra pending compaction tasks keeps increasing

Thanks Aaron and Jim for your reply. The data import is done. We have about 
135G on each node and it's about 28K SStables. For normal operation, we only 
have about 90 writes per seconds, but when I ran nodetool compationstats, it 
remains at 9 and hardly changes. I guess it's just an estimated number.

When I ran histogram,

Offset  SSTables Write Latency  Read Latency  Row Size  
Column Count
1   2644 0 0 0  
18660057
2   8204 0 0 0  
 9824270
3  11198 0 0 0  
 6968475
4   4269 6 0 0  
 5510745
551729 0 0  
 4595205


You can see about half of the reads result in 3 SSTables. Majority of read 
latency are under 5ms, only a dozen are over 10ms. We haven't fully turn on 
reads yet, only 60 reads per second. We see about 20 read timeout during the 
past 12 hours. Not a single warning from Cassandra Log.

Is it normal for Cassandra to timeout some requests? We set rpc timeout to be 
1s, it shouldn't time out any of them?

Thanks.
-Wei


From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Sent: Monday, January 21, 2013 12:21 AM
Subject: Re: Cassandra pending compaction tasks keeps increasing

The main guarantee LCS gives you is that most reads will only touch 1 row 
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

If compaction is falling behind this may not hold.

nodetool cfhistograms tells you how many SSTables were read from for reads.  
It's a recent histogram that resets each time you read from it.

Also, parallel levelled compaction in 1.2 
http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/01/2013, at 7:49 AM, Jim Cistaro 
jcist...@netflix.commailto:jcist...@netflix.com wrote:

1) In addition to iostat, dstat is a good tool to see wht kind of disck 
throuput your are getting.  That would be one thing to monitor.
2) For LCS, we also see pending compactions skyrocket.  During load, LCS will 
create a lot of small sstables which will queue up for compaction.
3) For us the biggest concern is not how high the pending count gets, but how 
often it gets back down near zero.  If your load is something you can do in 
segments or pause, then you can see how fast the cluster recovers on the 
compactions.
4) One thing which we tune per cluster is the size of the files.  Increasing 
this from 5MB can sometimes improve things.  But I forget if we have ever 
changed this after starting data load.

Is your cluster receiving read traffic during this data migration? If so, I 
would say that read latency is your best measure.  If the high number of 
SSTables waiting to compact is not hurting your reads, then you are probably 
ok.  Since you are on SSD, there is a good chance the compactions are not 
hurting you.  As for compactionthroughput, we set ours high for SSD.  You 
usually wont use it all because the compactions are usually single threaded.  
Dstat will help you measure this.

I hope this helps,
jc

From: Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org, Wei Zhu 
wz1...@yahoo.commailto:wz1...@yahoo.com
Date: Friday, January 18, 2013 12:10 PM
To: Cassandr usergroup 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Cassandra pending compaction tasks keeps increasing

Hi,
When I run nodetool compactionstats

I see the number of pending tasks keep going up steadily.

I tried to increase the  

Re: node down = log explosion?

2013-01-22 Thread Sergey Olefir
Do you have a suggestion as to what could be a better fit for counters?
Something that can also replicate across DCs and survive link breakdown
between nodes (across DCs)? (and no, I don't need 100.00% precision
(although it would be nice obviously), I just need to be pretty close for
the values of pretty)

On the subject of bug report -- I probably will -- but I'll wait a bit for
more info here, perhaps there's some configuration or something that I just
don't know about. 


Rob Coli wrote
 On Tue, Jan 22, 2013 at 5:03 AM, Sergey Olefir lt;

 solf.lists@

 gt; wrote:
 I am load-testing counter increments at the rate of about 10k per second.
 
 Do you need highly performant counters that count accurately, without
 meaningful chance of over-count? If so, Cassandra's counters are
 probably not ideal.
 
 We wanted to test what happens if one node goes down, so we brought one
 node
 down in DC1 (i.e. the node that was handling half of the incoming
 writes).
 ...
 This led to a complete explosion of logs on the remaining alive node in
 DC1.
 
 I agree, this level of exception logging during replicateOnWrite
 (which is called every time a counter is incremented) seems like a
 bug. I would file a bug at the Apache JIRA.
 
 =Rob
 
 -- 
 =Robert Coli
 AIMGTALK - 

 rcoli@

 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/node-down-log-explosion-tp7584932p7584954.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: node down = log explosion?

2013-01-22 Thread Rob Coli
On Tue, Jan 22, 2013 at 2:57 PM, Sergey Olefir solf.li...@gmail.com wrote:
 Do you have a suggestion as to what could be a better fit for counters?
 Something that can also replicate across DCs and survive link breakdown
 between nodes (across DCs)? (and no, I don't need 100.00% precision
 (although it would be nice obviously), I just need to be pretty close for
 the values of pretty)

In that case, Cassandra counters are probably fine.

 On the subject of bug report -- I probably will -- but I'll wait a bit for
 more info here, perhaps there's some configuration or something that I just
 don't know about.

Excepting on replicateOnWrite stage seems pretty unambiguous to me,
and unexpected. YMMV?

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Cassandra source code explained

2013-01-22 Thread Manu Zhang

On Wed 23 Jan 2013 01:10:58 AM CST, Radek Gruchalski wrote:

Thank you. I found this but was hoping that there's anything broader
out there.
This will have to be enough.

Kind regards,

Radek Gruchalski
radek.gruchal...@technicolor.com
mailto:radek.gruchal...@technicolor.com |
radek.gruchal...@portico.io mailto:radek.gruchal...@portico.io |

_ra...@gruchalski.com
_ mailto:ra...@gruchalski.com
00447889948663

/

/

*Confidentiality:*
This communication is intended for the above-named person and may be
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it,
nor must you copy or show it to anyone; please delete/destroy and
inform the sender immediately.

On Tuesday, 22 January 2013 at 18:08, Michael Kjellman wrote:


http://wiki.apache.org/cassandra/ArchitectureInternals

From: Radek Gruchalski radek.gruchal...@portico.io
mailto:radek.gruchal...@portico.io
Reply-To: user@cassandra.apache.org
mailto:user@cassandra.apache.org user@cassandra.apache.org
mailto:user@cassandra.apache.org
Date: Tuesday, January 22, 2013 9:07 AM
To: user@cassandra.apache.org mailto:user@cassandra.apache.org
user@cassandra.apache.org mailto:user@cassandra.apache.org
Subject: Cassandra source code explained

Hi everyone,

I am looking for any places where the Cassandra source code structure
would be explained.
Are there any articles / wiki available?

Kind regards,?
Radek Gruchalski
radek.gruchal...@technicolor.com
mailto:radek.gruchal...@technicolor.com |
radek.gruchal...@portico.io mailto:radek.gruchal...@portico.io |
?_ra...@gruchalski.com? mailto:ra...@gruchalski.com_
/
/*Confidentiality:*
This communication is intended for the above-named person and may be
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it,
nor must you copy or show it to anyone; please delete/destroy and
inform the sender immediately.





Here are two slides to get started
http://www.slideshare.net/gdusbabek/getting-to-know-the-cassandra-codebase
http://www.slideshare.net/gdusbabek/cassandra-codebase-2011



Re: node down = log explosion?

2013-01-22 Thread aaron morton
 Replication is configured as DC1:2,DC2:2 (i.e. every node holds the entire
 data).
I really recommend using RF 3. 


The error is the coordinator node protecting it's self. 

Basically it cannot handle the volume of local writes + the writes for HH.  The 
number of in flight hints is greater than…

private static volatile int maxHintsInProgress = 1024 * 
Runtime.getRuntime().availableProcessors();

You may be able to work around this by reducing the max_hint_window_in_ms in 
the yaml file so that hints are recorded if say the node has been down for more 
than 1 minute. 

Anyways I would say your test showed that the current cluster does not have 
sufficient capacity to handle the write load with one node down and HH enabled 
at the current level. You can either add more nodes, use nodes with more cores, 
adjust the HH settings, or reduce the throughput. 


 On the subject of bug report -- I probably will -- but I'll wait a bit for

perhaps the excessive logging could be handled better, please add a ticket when 
you have time. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/01/2013, at 2:12 PM, Rob Coli rc...@palominodb.com wrote:

 On Tue, Jan 22, 2013 at 2:57 PM, Sergey Olefir solf.li...@gmail.com wrote:
 Do you have a suggestion as to what could be a better fit for counters?
 Something that can also replicate across DCs and survive link breakdown
 between nodes (across DCs)? (and no, I don't need 100.00% precision
 (although it would be nice obviously), I just need to be pretty close for
 the values of pretty)
 
 In that case, Cassandra counters are probably fine.
 
 On the subject of bug report -- I probably will -- but I'll wait a bit for
 more info here, perhaps there's some configuration or something that I just
 don't know about.
 
 Excepting on replicateOnWrite stage seems pretty unambiguous to me,
 and unexpected. YMMV?
 
 =Rob
 
 -- 
 =Robert Coli
 AIMGTALK - rc...@palominodb.com
 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb



Re: LCS not removing rows with all TTL expired columns

2013-01-22 Thread Bryan Talbot
It turns out that having gc_grace=0 isn't required to produce the problem.
 My colleague did a lot of digging into the compaction code and we think
he's found the issue.  It's detailed in
https://issues.apache.org/jira/browse/CASSANDRA-5182

Basically tombstones for a row will not be removed from an SSTable during
compaction if the row appears in other SSTables; however, the compaction
code checks the bloom filters to make this determination.  Since this data
is rarely read we had the bloom_filter_fp_ratio set to 1.0 which makes rows
seem to appear in every SSTable as far as compaction is concerned.

This caused our data to essentially never be removed when using either STSC
or LCS and will probably affect anyone else running 1.1 with high bloom
filter fp ratios.

Setting our fp ratio to 0.1, running upgradesstables and running the
application as it was before seems to have stabilized the load as desired
at the expense of additional jvm memory.

-Bryan


On Thu, Jan 17, 2013 at 6:50 PM, Bryan Talbot btal...@aeriagames.comwrote:

 Bleh, I rushed out the email before some meetings and I messed something
 up.  Working on reproducing now with better notes this time.

 -Bryan



 On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams de...@fyrie.net wrote:

 When you ran this test, is that the exact schema you used? I'm not seeing
 where you are setting gc_grace to 0 (although I could just be blind, it
 happens).


 On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot btal...@aeriagames.comwrote:

 I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7,
 1.1.8, a trivial schema, and a simple script that just inserts rows.  If
 the TTL is small enough so that all LCS data fits in generation 0 then the
 rows seem to be removed with TTL expires as desired.  However, if the
 insertion rate is high enough or the TTL long enough then the data keep
 accumulating for far longer than expected.

 Using 120 second TTL and a single threaded php insertion script my MBP
 with SSD retained almost all of the data.  120 seconds should accumulate
 5-10 MB of data.  I would expect that TTL rows to be removed eventually and
 for the cassandra load to level off at some reasonable value near 10 MB.
  After running for 2 hours and with a cassandra load of ~550 MB I stopped
 the test.

 The schema is

 create keyspace test
   with placement_strategy = 'SimpleStrategy'
   and strategy_options = {replication_factor : 1}
   and durable_writes = true;

 use test;

 create column family test
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'UTF8Type'
   and key_validation_class = 'TimeUUIDType'
   and compaction_strategy =
 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
   and caching = 'NONE'
   and bloom_filter_fp_chance = 1.0
   and column_metadata = [
 {column_name : 'a',
 validation_class : LongType}];


 and the insert script is

 ?php

 require_once('phpcassa/1.0.a.5/autoload.php');

 use phpcassa\Connection\ConnectionPool;
 use phpcassa\ColumnFamily;
 use phpcassa\SystemManager;
 use phpcassa\UUID;

 // Connect to test keyspace and column family
 $sys = new SystemManager('127.0.0.1');

 // Start a connection pool, create our ColumnFamily instance
 $pool = new ConnectionPool('test', array('127.0.0.1'));
 $testCf = new ColumnFamily($pool, 'test');

 // Insert records
 while( 1 ) {
   $testCf-insert(UUID::uuid1(), array(a = 1), null, 120);
 }

 // Close our connections
 $pool-close();
 $sys-close();

 ?


 -Bryan




 On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot 
 btal...@aeriagames.comwrote:

 We are using LCS and the particular row I've referenced has been
 involved in several compactions after all columns have TTL expired.  The
 most recent one was again this morning and the row is still there -- TTL
 expired for several days now with gc_grace=0 and several compactions later
 ...


 $ ./bin/nodetool -h localhost getsstables metrics request_summary
 459fb460-5ace-11e2-9b92-11d67b6163b4

 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db

 $ ls -alF
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
 -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db


 $ ./bin/sstable2json
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
 -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 
 %x')
  {
 34353966623436302d356163652d313165322d396239322d313164363762363136336234:
 [[app_name,50f21d3d,1357785277207001,d],
 [client_ip,50f21d3d,1357785277207001,d],
 [client_req_id,50f21d3d,1357785277207001,d],
 [mysql_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_duration_us,50f21d3d,1357785277207001,d],
 [mysql_failure_call_cnt,50f21d3d,1357785277207001,d],
 [mysql_success_call_cnt,50f21d3d,1357785277207001,d],
 

Re: LCS not removing rows with all TTL expired columns

2013-01-22 Thread Derek Williams
Thanks for letting us know. I also have a some tables with a lot of
activity and very short ttls, and while I haven't experienced this problem,
it's good to know just in case.


On Tue, Jan 22, 2013 at 7:35 PM, Bryan Talbot btal...@aeriagames.comwrote:

 It turns out that having gc_grace=0 isn't required to produce the problem.
  My colleague did a lot of digging into the compaction code and we think
 he's found the issue.  It's detailed in
 https://issues.apache.org/jira/browse/CASSANDRA-5182

 Basically tombstones for a row will not be removed from an SSTable during
 compaction if the row appears in other SSTables; however, the compaction
 code checks the bloom filters to make this determination.  Since this data
 is rarely read we had the bloom_filter_fp_ratio set to 1.0 which makes rows
 seem to appear in every SSTable as far as compaction is concerned.

 This caused our data to essentially never be removed when using either
 STSC or LCS and will probably affect anyone else running 1.1 with high
 bloom filter fp ratios.

 Setting our fp ratio to 0.1, running upgradesstables and running the
 application as it was before seems to have stabilized the load as desired
 at the expense of additional jvm memory.

 -Bryan


 On Thu, Jan 17, 2013 at 6:50 PM, Bryan Talbot btal...@aeriagames.comwrote:

 Bleh, I rushed out the email before some meetings and I messed something
 up.  Working on reproducing now with better notes this time.

 -Bryan



 On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams de...@fyrie.net wrote:

 When you ran this test, is that the exact schema you used? I'm not
 seeing where you are setting gc_grace to 0 (although I could just be blind,
 it happens).


 On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot btal...@aeriagames.comwrote:

 I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7,
 1.1.8, a trivial schema, and a simple script that just inserts rows.  If
 the TTL is small enough so that all LCS data fits in generation 0 then the
 rows seem to be removed with TTL expires as desired.  However, if the
 insertion rate is high enough or the TTL long enough then the data keep
 accumulating for far longer than expected.

 Using 120 second TTL and a single threaded php insertion script my MBP
 with SSD retained almost all of the data.  120 seconds should accumulate
 5-10 MB of data.  I would expect that TTL rows to be removed eventually and
 for the cassandra load to level off at some reasonable value near 10 MB.
  After running for 2 hours and with a cassandra load of ~550 MB I stopped
 the test.

 The schema is

 create keyspace test
   with placement_strategy = 'SimpleStrategy'
   and strategy_options = {replication_factor : 1}
   and durable_writes = true;

 use test;

 create column family test
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'UTF8Type'
   and key_validation_class = 'TimeUUIDType'
   and compaction_strategy =
 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
   and caching = 'NONE'
   and bloom_filter_fp_chance = 1.0
   and column_metadata = [
 {column_name : 'a',
 validation_class : LongType}];


 and the insert script is

 ?php

 require_once('phpcassa/1.0.a.5/autoload.php');

 use phpcassa\Connection\ConnectionPool;
 use phpcassa\ColumnFamily;
 use phpcassa\SystemManager;
 use phpcassa\UUID;

 // Connect to test keyspace and column family
 $sys = new SystemManager('127.0.0.1');

 // Start a connection pool, create our ColumnFamily instance
 $pool = new ConnectionPool('test', array('127.0.0.1'));
 $testCf = new ColumnFamily($pool, 'test');

 // Insert records
 while( 1 ) {
   $testCf-insert(UUID::uuid1(), array(a = 1), null, 120);
 }

 // Close our connections
 $pool-close();
 $sys-close();

 ?


 -Bryan




 On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot 
 btal...@aeriagames.comwrote:

 We are using LCS and the particular row I've referenced has been
 involved in several compactions after all columns have TTL expired.  The
 most recent one was again this morning and the row is still there -- TTL
 expired for several days now with gc_grace=0 and several compactions later
 ...


 $ ./bin/nodetool -h localhost getsstables metrics request_summary
 459fb460-5ace-11e2-9b92-11d67b6163b4

 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db

 $ ls -alF
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
 -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db


 $ ./bin/sstable2json
 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db
 -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump  -e '36/1 
 %x')
  {
 34353966623436302d356163652d313165322d396239322d313164363762363136336234:
 [[app_name,50f21d3d,1357785277207001,d],
 [client_ip,50f21d3d,1357785277207001,d],
 

Re: node down = log explosion?

2013-01-22 Thread Sergey Olefir
Thanks!

Node writing to log because it cannot handle load is much different than
node writing to log just because. Although the amount of logging is still
excessive and would it really hurt anything to add something like can't
handle load to the exception message?

On the subject of RF:3 -- could you please elaborate?
- Why RF:3 is important? (vs e.g. 2)
- My total replication factor is 4 over two DCs -- I suppose you mean 3
replicas in each DC?
- Does that mean I'll have to run at least 4 nodes in each DC? (3 for RF:3
and one additional in case one fails)

(and again -- thanks Aaron! You've been helping me A LOT on this list.)
Best regards,
Sergey


aaron morton wrote
 Replication is configured as DC1:2,DC2:2 (i.e. every node holds the
 entire
 data).
 I really recommend using RF 3. 
 
 
 The error is the coordinator node protecting it's self. 
 
 Basically it cannot handle the volume of local writes + the writes for HH. 
 The number of in flight hints is greater than…
 
 private static volatile int maxHintsInProgress = 1024 *
 Runtime.getRuntime().availableProcessors();
 
 You may be able to work around this by reducing the max_hint_window_in_ms
 in the yaml file so that hints are recorded if say the node has been down
 for more than 1 minute. 
 
 Anyways I would say your test showed that the current cluster does not
 have sufficient capacity to handle the write load with one node down and
 HH enabled at the current level. You can either add more nodes, use nodes
 with more cores, adjust the HH settings, or reduce the throughput. 
 
 
 On the subject of bug report -- I probably will -- but I'll wait a bit
 for
 
 perhaps the excessive logging could be handled better, please add a ticket
 when you have time. 
 
 Cheers
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 23/01/2013, at 2:12 PM, Rob Coli lt;

 rcoli@

 gt; wrote:
 
 On Tue, Jan 22, 2013 at 2:57 PM, Sergey Olefir lt;

 solf.lists@

 gt; wrote:
 Do you have a suggestion as to what could be a better fit for counters?
 Something that can also replicate across DCs and survive link breakdown
 between nodes (across DCs)? (and no, I don't need 100.00% precision
 (although it would be nice obviously), I just need to be pretty close
 for
 the values of pretty)
 
 In that case, Cassandra counters are probably fine.
 
 On the subject of bug report -- I probably will -- but I'll wait a bit
 for
 more info here, perhaps there's some configuration or something that I
 just
 don't know about.
 
 Excepting on replicateOnWrite stage seems pretty unambiguous to me,
 and unexpected. YMMV?
 
 =Rob
 
 -- 
 =Robert Coli
 AIMGTALK - 

 rcoli@

 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/node-down-log-explosion-tp7584932p7584960.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.