Re: Difference in retrieving data from cassandra

2014-09-26 Thread Umang Shah
Hey Jonathan,

Thanks for your reply.
i created schema structure in this manner

CREATE SCHEMA schemaname WITH replication = { 'class' : 'SimpleStrategy',
'replication_factor' : 1 };
and table according to requirement.

I didn't used node structure.

So will it be the reason for performance?

And can you also tell me what is the difference between the structure i
used and in Node Structure.

Regards,
Umang Shah
BI-ETL Developer

On Thu, Sep 25, 2014 at 4:48 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 You'll need to provide a bit of information.  To start, a query trace
 from would be helpful.


 http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

 (self promo) You may want to read over my blog post regarding
 diagnosing problems in production.  I've covered diagnosing slow
 queries:
 http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/


 On Thu, Sep 25, 2014 at 4:21 AM, Umang Shah shahuma...@gmail.com wrote:
  Hi All,
 
  I am using cassandra with Pentaho PDI kettle, i have installed cassandra
 in
  Amazon EC2 instance and in local-machine, so when i am trying to retrieve
  data from local machine using Pentaho PDI it is taking few seconds (not
 more
  then 20 seconds) and if i do the same using production data-base it takes
  almost 3 minutes for the same number of data , which is huge difference.
 
  So if anybody can give me some comments of solution that what i need to
  check for this or how can i narrow down this difference?
 
  on local machine and production server RAM is same.
  Local machine is windows environment and production is Linux.
 
  --
  Regards,
  Umang V.Shah
  BI-ETL Developer



 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade




-- 
Regards,
Umang V.Shah
+919886829019


DevCenter and Cassandra 2.1

2014-09-26 Thread Andrew Cobley
Hi all,

I notice that descanter 1.1.1doesn’t support User defined types (as far as I 
can see).  Is it just a matter of importing a template or will we need to wait 
for full 2.1 support in descanter ?

Andy


The University of Dundee is a registered Scottish Charity, No: SC015096


Re: DevCenter and Cassandra 2.1

2014-09-26 Thread Alex Popescu
Hi Andrew,

DevCenter has a complete CQL parser inside which helps with the offline
validations and suggestions. So the bad news is that it requires a new
version for every CQL grammar change.
The good news is that this wait is not going to be too long (I cannot talk
yet about a specific release date, but it's getting there).

On Fri, Sep 26, 2014 at 2:13 AM, Andrew Cobley a.e.cob...@dundee.ac.uk
wrote:

 Hi all,

 I notice that descanter 1.1.1doesn’t support User defined types (as far as
 I can see).  Is it just a matter of importing a template or will we need to
 wait for full 2.1 support in descanter ?

 Andy


 The University of Dundee is a registered Scottish Charity, No: SC015096




-- 

:- a)


Alex Popescu
Sen. Product Manager @ DataStax
@al3xandru


How to setup Cassandra client-to-node encryption

2014-09-26 Thread Lu, Boying
Hi, All,

I use the following configuration (in yaml file) to enable the client-to-node 
encryption:
client_encryption_options:
enabled: true
keystore: path-to-keystore-file
keystore_password: some-password
truststore: path-to-truststore-file
truststore_password: some-password

But when Cassandra starts, I got following error:
Caused by: org.apache.thrift.transport.TTransportException: Could not bind to 
port 9160
at 
org.apache.thrift.transport.TSSLTransportFactory.createServer(TSSLTransportFactory.java:117)
at 
org.apache.thrift.transport.TSSLTransportFactory.getServerSocket(TSSLTransportFactory.java:103)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$Factory.buildTServer(CustomTThreadPoolServer.java:253)
... 6 more
Caused by: java.lang.IllegalArgumentException: Cannot support 
TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers
at sun.security.ssl.CipherSuiteList.init(CipherSuiteList.java:92)
at 
sun.security.ssl.SSLServerSocketImpl.setEnabledCipherSuites(SSLServerSocketImpl.java:191)
at 
org.apache.thrift.transport.TSSLTransportFactory.createServer(TSSLTransportFactory.java:113)
... 8 more

Does anyone know the root cause?

Thanks a lot.

Boying



Re: How to setup Cassandra client-to-node encryption

2014-09-26 Thread Bulat Shakirzyanov
Hi,

You need to install JCE -
http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html

Bulat

On Sep 26, 2014, at 7:58, Lu, Boying boying...@emc.com wrote:

Hi, All,



I use the following configuration (in yaml file) to enable the
client-to-node encryption:

client_encryption_options:

enabled: true

keystore: path-to-keystore-file

keystore_password: some-password

truststore: path-to-truststore-file

truststore_password: some-password



But when Cassandra starts, I got following error:

Caused by: org.apache.thrift.transport.TTransportException: Could not bind
to port 9160

at
org.apache.thrift.transport.TSSLTransportFactory.createServer(TSSLTransportFactory.java:117)

at
org.apache.thrift.transport.TSSLTransportFactory.getServerSocket(TSSLTransportFactory.java:103)

at
org.apache.cassandra.thrift.CustomTThreadPoolServer$Factory.buildTServer(CustomTThreadPoolServer.java:253)

... 6 more

Caused by: java.lang.IllegalArgumentException: Cannot support
TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers

at sun.security.ssl.CipherSuiteList.init(CipherSuiteList.java:92)

at
sun.security.ssl.SSLServerSocketImpl.setEnabledCipherSuites(SSLServerSocketImpl.java:191)

at
org.apache.thrift.transport.TSSLTransportFactory.createServer(TSSLTransportFactory.java:113)

... 8 more



Does anyone know the root cause?



Thanks a lot.



Boying


RE: How to setup Cassandra client-to-node encryption

2014-09-26 Thread Lu, Boying
Thanks a lot.  I’ll try it.

From: Bulat Shakirzyanov [mailto:mallluh...@gmail.com]
Sent: 2014年9月26日 23:58
To: user@cassandra.apache.org
Subject: Re: How to setup Cassandra client-to-node encryption

Hi,

You need to install JCE - 
http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html

Bulat

On Sep 26, 2014, at 7:58, Lu, Boying 
boying...@emc.commailto:boying...@emc.com wrote:
Hi, All,

I use the following configuration (in yaml file) to enable the client-to-node 
encryption:
client_encryption_options:
enabled: true
keystore: path-to-keystore-file
keystore_password: some-password
truststore: path-to-truststore-file
truststore_password: some-password

But when Cassandra starts, I got following error:
Caused by: org.apache.thrift.transport.TTransportException: Could not bind to 
port 9160
at 
org.apache.thrift.transport.TSSLTransportFactory.createServer(TSSLTransportFactory.java:117)
at 
org.apache.thrift.transport.TSSLTransportFactory.getServerSocket(TSSLTransportFactory.java:103)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$Factory.buildTServer(CustomTThreadPoolServer.java:253)
... 6 more
Caused by: java.lang.IllegalArgumentException: Cannot support 
TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers
at sun.security.ssl.CipherSuiteList.init(CipherSuiteList.java:92)
at 
sun.security.ssl.SSLServerSocketImpl.setEnabledCipherSuites(SSLServerSocketImpl.java:191)
at 
org.apache.thrift.transport.TSSLTransportFactory.createServer(TSSLTransportFactory.java:113)
... 8 more

Does anyone know the root cause?

Thanks a lot.

Boying



Re: using dynamic cell names in CQL 3

2014-09-26 Thread Brice Dutheil
I’m not sure I understand correctly “for example column.name would be
event_name(temperature)“, what I gather however is that you have multiple
events that may or may not have certain properties, in your example I
believe you mean you want a CF for events with a type event_name that
contains a column temperature ?!

You can model it like that :

CREATE TABLE events (
  name text,
  metric text,
  value text,
  PRIMARY KEY (name, metric)
)

Where

   - name is the row key, for each kind (or name) of event
   - metric is the column name, aka the clustering key

For example when inserting

INSERT INTO events (name, metric, value) VALUES ('captor',
'temperature', '25 ºC');
INSERT INTO events (name, metric, value) VALUES ('captor', 'wind', '5 km/h');
INSERT INTO events (name, metric, value) VALUES ('captor',
'atmosphere', '1013 millibars');

INSERT INTO events (name, metric, value) VALUES ('cpu', 'temperature', '70 ºC');
INSERT INTO events (name, metric, value) VALUES ('cpu', 'frequency',
'1015,7 MHz');

You will have something like this :
  temperature atmosphere wind frequency   meteorologic 25 ºC 1013 millibars 5
km/h  cpu 70 ºC  1015,7 MHz

CQLSH represents each clustering key as row, which is not how the column
family is stored.

The model I give is just an example as you may want a to model differently
according to your use cases. And time probably is part of it. And will
probably be in the clustering key too.

Note that if you create wide rows and you have *a lot* of data, you may
want to bucket the CF per time period (month / week / day / etc).

HTH

— Brice

On Thu, Sep 25, 2014 at 3:13 PM, shahab shahab.mok...@gmail.com wrote:

Thanks,
 It seems that I was not clear in my question, I would like to store values
 in the column name, for example column.name would be event_name
 (temperature) and column-content would be the respective value (e.g.
 40.5) . And I need to know how the schema should look like in CQL 3

 best,
 /Shahab


 On Wed, Sep 24, 2014 at 1:49 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Dynamic thing in Thrift ≈ clustering columns in CQL

 Can you give more details about your data model ?

 On Wed, Sep 24, 2014 at 1:11 PM, shahab shahab.mok...@gmail.com wrote:

 Hi,

 I  would like to define schema for a table where the column (cell) names
 are defined dynamically. Apparently there is a way to do this in Thrift (
 http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows
 )

 but i couldn't find how i can do the same using CQL?

 Any resource/example that I can look at ?


 best,
 /Shahab



  ​


Repair taking long time

2014-09-26 Thread Gene Robichaux
I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in 
another.

Running a repair on a large column family seems to be moving much slower than I 
expect.

Looking at nodetool compaction stats it indicates the Validation phase is 
running that the total bytes is 4.5T (4505336278756).

This is a very large CF. The process has been running for 2.5 hours and has 
processed 71G (71950433062). That rate is about 28.4 GB per hour. At this rate 
it will take 158 hours, just shy of 1 week.

Is this reasonable? This is my first large repair and I am wondering if this is 
normal for a CF of this size. Seems like a long time to me.

Is it possible to tune this process to speed it up? Is there something in my 
configuration that could be causing this slow performance? I am running HDDs, 
not SSDs in a JBOD configuration.



Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273



Re: Repair taking long time

2014-09-26 Thread Jonathan Haddad
Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
This problem is addressed in 2.1.

On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in
 another.



 Running a repair on a large column family seems to be moving much slower
 than I expect.



 Looking at nodetool compaction stats it indicates the Validation phase is
 running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours and has
 processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
 rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering if this
 is normal for a CF of this size. Seems like a long time to me.



 Is it possible to tune this process to speed it up? Is there something in my
 configuration that could be causing this slow performance? I am running
 HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273





-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Repair taking long time

2014-09-26 Thread Brice Dutheil
Unfortunately DSE 4.5.0 is still on 2.0.x

-- Brice

On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
 This problem is addressed in 2.1.

 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
 gene.robich...@match.com wrote:
  I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and
 4 in
  another.
 
 
 
  Running a repair on a large column family seems to be moving much slower
  than I expect.
 
 
 
  Looking at nodetool compaction stats it indicates the Validation phase is
  running that the total bytes is 4.5T (4505336278756).
 
 
 
  This is a very large CF. The process has been running for 2.5 hours and
 has
  processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
  rate it will take 158 hours, just shy of 1 week.
 
 
 
  Is this reasonable? This is my first large repair and I am wondering if
 this
  is normal for a CF of this size. Seems like a long time to me.
 
 
 
  Is it possible to tune this process to speed it up? Is there something
 in my
  configuration that could be causing this slow performance? I am running
  HDDs, not SSDs in a JBOD configuration.
 
 
 
 
 
 
 
  Gene Robichaux
 
  Manager, Database Operations
 
  Match.com
 
  8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
 
  Phone: 214-576-3273
 
 



 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade



Re: Repair taking long time

2014-09-26 Thread Bryan Talbot
With a 4.5 TB table and just 4 nodes, repair will likely take forever for
any version.

-Bryan


On Fri, Sep 26, 2014 at 10:40 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
 This problem is addressed in 2.1.

 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
 gene.robich...@match.com wrote:
  I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and
 4 in
  another.
 
 
 
  Running a repair on a large column family seems to be moving much slower
  than I expect.
 
 
 
  Looking at nodetool compaction stats it indicates the Validation phase is
  running that the total bytes is 4.5T (4505336278756).
 
 
 
  This is a very large CF. The process has been running for 2.5 hours and
 has
  processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
  rate it will take 158 hours, just shy of 1 week.
 
 
 
  Is this reasonable? This is my first large repair and I am wondering if
 this
  is normal for a CF of this size. Seems like a long time to me.
 
 
 
  Is it possible to tune this process to speed it up? Is there something
 in my
  configuration that could be causing this slow performance? I am running
  HDDs, not SSDs in a JBOD configuration.
 
 
 
 
 
 
 
  Gene Robichaux
 
  Manager, Database Operations
 
  Match.com
 
  8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
 
  Phone: 214-576-3273
 
 



 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade



RE: Repair taking long time

2014-09-26 Thread Gene Robichaux
I am on DSE 4.0.3 which is 2.0.7.

If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..

The bad thing is that table is not our largest….. :(


Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273

From: Brice Dutheil [mailto:brice.duth...@gmail.com]
Sent: Friday, September 26, 2014 12:47 PM
To: user@cassandra.apache.org
Subject: Re: Repair taking long time

Unfortunately DSE 4.5.0 is still on 2.0.x

-- Brice

On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad 
j...@jonhaddad.commailto:j...@jonhaddad.com wrote:
Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
This problem is addressed in 2.1.

On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
gene.robich...@match.commailto:gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in
 another.



 Running a repair on a large column family seems to be moving much slower
 than I expect.



 Looking at nodetool compaction stats it indicates the Validation phase is
 running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours and has
 processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
 rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering if this
 is normal for a CF of this size. Seems like a long time to me.



 Is it possible to tune this process to speed it up? Is there something in my
 configuration that could be causing this slow performance? I am running
 HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273




--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade



Re: Repair taking long time

2014-09-26 Thread Jonathan Haddad
If you're using DSE you might want to contact Datastax support, rather
than the ML.

On Fri, Sep 26, 2014 at 10:52 AM, Gene Robichaux
gene.robich...@match.com wrote:
 I am on DSE 4.0.3 which is 2.0.7.



 If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..



 The bad thing is that table is not our largest….. :(





 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273



 From: Brice Dutheil [mailto:brice.duth...@gmail.com]
 Sent: Friday, September 26, 2014 12:47 PM
 To: user@cassandra.apache.org
 Subject: Re: Repair taking long time



 Unfortunately DSE 4.5.0 is still on 2.0.x


 -- Brice



 On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
 This problem is addressed in 2.1.


 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
 gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4
 in
 another.



 Running a repair on a large column family seems to be moving much slower
 than I expect.



 Looking at nodetool compaction stats it indicates the Validation phase is
 running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours and
 has
 processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
 rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering if
 this
 is normal for a CF of this size. Seems like a long time to me.



 Is it possible to tune this process to speed it up? Is there something in
 my
 configuration that could be causing this slow performance? I am running
 HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


RE: Repair taking long time

2014-09-26 Thread Gene Robichaux
Using their community edition..no support (yet!) :(

Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273

-Original Message-
From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf Of 
Jonathan Haddad
Sent: Friday, September 26, 2014 12:58 PM
To: user@cassandra.apache.org
Subject: Re: Repair taking long time

If you're using DSE you might want to contact Datastax support, rather than the 
ML.

On Fri, Sep 26, 2014 at 10:52 AM, Gene Robichaux gene.robich...@match.com 
wrote:
 I am on DSE 4.0.3 which is 2.0.7.



 If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..



 The bad thing is that table is not our largest….. :(





 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273



 From: Brice Dutheil [mailto:brice.duth...@gmail.com]
 Sent: Friday, September 26, 2014 12:47 PM
 To: user@cassandra.apache.org
 Subject: Re: Repair taking long time



 Unfortunately DSE 4.5.0 is still on 2.0.x


 -- Brice



 On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
 This problem is addressed in 2.1.


 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux 
 gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC 
 and 4 in another.



 Running a repair on a large column family seems to be moving much 
 slower than I expect.



 Looking at nodetool compaction stats it indicates the Validation 
 phase is running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours 
 and has processed 71G (71950433062). That rate is about 28.4 GB per 
 hour. At this rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering 
 if this is normal for a CF of this size. Seems like a long time to 
 me.



 Is it possible to tune this process to speed it up? Is there 
 something in my configuration that could be causing this slow 
 performance? I am running HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade





--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Repair taking long time

2014-09-26 Thread Jonathan Haddad
Well, in that case, you may want to roll your own script for doing
constant repairs of your cluster, and extend your gc grace seconds so
you can repair the whole cluster before the tombstones are cleared.

On Fri, Sep 26, 2014 at 11:15 AM, Gene Robichaux
gene.robich...@match.com wrote:
 Using their community edition..no support (yet!) :(

 Gene Robichaux
 Manager, Database Operations
 Match.com
 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
 Phone: 214-576-3273

 -Original Message-
 From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf 
 Of Jonathan Haddad
 Sent: Friday, September 26, 2014 12:58 PM
 To: user@cassandra.apache.org
 Subject: Re: Repair taking long time

 If you're using DSE you might want to contact Datastax support, rather than 
 the ML.

 On Fri, Sep 26, 2014 at 10:52 AM, Gene Robichaux gene.robich...@match.com 
 wrote:
 I am on DSE 4.0.3 which is 2.0.7.



 If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..



 The bad thing is that table is not our largest….. :(





 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273



 From: Brice Dutheil [mailto:brice.duth...@gmail.com]
 Sent: Friday, September 26, 2014 12:47 PM
 To: user@cassandra.apache.org
 Subject: Re: Repair taking long time



 Unfortunately DSE 4.5.0 is still on 2.0.x


 -- Brice



 On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Are you using Cassandra 2.0  vnodes?  If so, repair takes forever.
 This problem is addressed in 2.1.


 On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
 gene.robich...@match.com wrote:
 I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC
 and 4 in another.



 Running a repair on a large column family seems to be moving much
 slower than I expect.



 Looking at nodetool compaction stats it indicates the Validation
 phase is running that the total bytes is 4.5T (4505336278756).



 This is a very large CF. The process has been running for 2.5 hours
 and has processed 71G (71950433062). That rate is about 28.4 GB per
 hour. At this rate it will take 158 hours, just shy of 1 week.



 Is this reasonable? This is my first large repair and I am wondering
 if this is normal for a CF of this size. Seems like a long time to
 me.



 Is it possible to tune this process to speed it up? Is there
 something in my configuration that could be causing this slow
 performance? I am running HDDs, not SSDs in a JBOD configuration.







 Gene Robichaux

 Manager, Database Operations

 Match.com

 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

 Phone: 214-576-3273




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Apache Cassandra 2.1.0 : cassandra-stress performance discrepancy between SSD and SATA drive

2014-09-26 Thread Shing Hing Man
Hi,
  I have run   cassandra-stress write and  cassandra-stress read  on my office 
PC and on my home PC. 

Office PC : Intel Core i7-4479, 8  virtual core, 16G RAM, 500G SSD Home PC : 
Intel Xeon E3-1230V3, 8 virtual core,  8G RAM, 500G SATA disk.

From the cassandra-stress result (please see below), it seems Cassandra ismore 
than 100% performant on my home PC than the office PC. I am expecting the other 
way around, as my office PC has much  better hardware. 


Office : Intel Core i7-4479, 9  virtual cores, 16G RAM, 500G 
SSDcauchy:~/installed/cassandra/tools/bin ./cassandra-stress write 
Running with 8 threadCount
Results:
op rate   : 11264
partition rate: 11264
row rate  : 11264
latency mean  : 0.7
latency median: 0.4
latency 95th percentile   : 0.9
latency 99th percentile   : 1.6
latency 99.9th percentile : 5.3
latency max   : 325.3
Total operation time  : 00:02:40


cauchy:~/installed/cassandra/tools/bin ./cassandra-stress read 
Running with 8 threadCount
Results:
op rate   : 13702
partition rate: 13702
row rate  : 13702
latency mean  : 0.5
latency median: 0.5
latency 95th percentile   : 0.8
latency 99th percentile   : 1.4
latency 99.9th percentile : 3.4
latency max   : 67.1
Total operation time  : 00:00:30

---
--

Home: Intel Xeon E3-1230V3, 8 virtual core,  8G RAM, 500G SATA disk.

matmsh@gauss:~/installed/cassandra/tools/bin ./cassandra-stress write
Running with 8 threadCount

Results:
op rate   : 25181
partition rate: 25181
row rate  : 25181
latency mean  : 0.3
latency median: 0.2
latency 95th percentile   : 0.3
latency 99th percentile   : 0.5
latency 99.9th percentile : 16.7
latency max   : 331.0
Total operation time  : 00:03:24

gauss:~/installed/cassandra/tools/bin ./cassandra-stress read
Results:
op rate   : 35338
partition rate: 35338
row rate  : 35338
latency mean  : 0.2
latency median: 0.2
latency 95th percentile   : 0.3
latency 99th percentile   : 0.4
latency 99.9th percentile : 1.1
latency max   : 17.7
Total operation time  : 00:00:30


Is the above result expected ?
Thanks in advance for any suggestions !

Shing

simple map / table scans without hadoop?

2014-09-26 Thread Kevin Burton
I have the requirements to periodically run full tables scans on our data.
It’s mostly for repair tasks or making bulk UPDATEs… but I’d prefer to do
it in Java because I need something mildly trivial.

Pig / hadoop / etc are mildly overkill for this.  I don’t want or need a
whole hadoop or HDFS setup for this.

For example, a full table scan, and if a field matches a regex, set another
column based on that value.

Seems like this wouldn’t be too hard.  Just write a daemon that looks at
the key distribution and runs a scan on the data closest to it.  It would
be ideal if it was in a separate daemon so that you couldn’t accidentally
read all that data into memory and then OOM the Cassandra daemon.

Does this already exist?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Apache Cassandra 2.1.0 : cassandra-stress performance discrepancy between SSD and SATA drive

2014-09-26 Thread Kevin Burton
What SSD was it?  There are a lot of variability in terms of SSD
performance.

1.  Is it a new vs old SSD?  Old SSDs can become slower if they’re really
worn out

2.  was the office SSD near capacity holding other data?

3.  what models were they?

SSD != SSD… there is a massive amount of performance variability out there.

… also … more data is needed.  JDK versions the same?  cassandra versions
the same?

what about the config?

On Fri, Sep 26, 2014 at 2:39 PM, Shing Hing Man mat...@yahoo.com wrote:

 Hi,
   I have run   cassandra-stress write and  cassandra-stress read  on my
 office PC and on my home PC.

 Office PC : Intel Core i7-4479, 8 virtual core, 16G RAM, 500G SSD Home PC
 : Intel Xeon E3-1230V3, 8 virtual core, 8G RAM, 500G SATA disk.

 From the cassandra-stress result (please see below), it seems Cassandra is 
 more
 than 100% performant on my home PC than the office PC. I am expecting the
 other way around, as my office PC has much better hardware.

 Office : Intel Core i7-4479, 9 virtual cores, 16G RAM, 500G SSD
  cauchy:~/installed/cassandra/tools/bin ./cassandra-stress write
 Running with 8 threadCount
 Results:
 op rate : 11264
 partition rate : 11264
 row rate : 11264
 latency mean : 0.7
 latency median : 0.4
 latency 95th percentile : 0.9
 latency 99th percentile : 1.6
 latency 99.9th percentile : 5.3
 latency max : 325.3
 Total operation time : 00:02:40


 cauchy:~/installed/cassandra/tools/bin ./cassandra-stress read
 Running with 8 threadCount
 Results:
 op rate : 13702
 partition rate : 13702
 row rate : 13702
 latency mean : 0.5
 latency median : 0.5
 latency 95th percentile : 0.8
 latency 99th percentile : 1.4
 latency 99.9th percentile : 3.4
 latency max : 67.1
 Total operation time : 00:00:30

 ---
 --
 Home : Intel Xeon E3-1230V3, 8 virtual core, 8G RAM, 500G SATA disk.

 matmsh@gauss:~/installed/cassandra/tools/bin ./cassandra-stress write
 Running with 8 threadCount

 Results:
 op rate : 25181
 partition rate : 25181
 row rate : 25181
 latency mean : 0.3
 latency median : 0.2
 latency 95th percentile : 0.3
 latency 99th percentile : 0.5
 latency 99.9th percentile : 16.7
 latency max : 331.0
 Total operation time : 00:03:24

 gauss:~/installed/cassandra/tools/bin ./cassandra-stress read
   Results:
 op rate : 35338
 partition rate : 35338
 row rate : 35338
 latency mean : 0.2
 latency median : 0.2
 latency 95th percentile : 0.3
 latency 99th percentile : 0.4
 latency 99.9th percentile : 1.1
 latency max : 17.7
 Total operation time : 00:00:30


 Is the above result expected ?
 Thanks in advance for any suggestions !

 Shing





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com