Multiget performance

2014-04-09 Thread Allan C
Hi all,

I’ve always been told that multigets are a Cassandra anti-pattern for 
performance reasons. I ran a quick test tonight to prove it to myself, and, 
sure enough, slowness ensued. It takes about 150ms to get 100 keys for my use 
case. Not terrible, but at least an order of magnitude from what I need it to 
be.

So far, I’ve been able to denormalize and not have any problems. Today, I ran 
into a use case where denormalization introduces a huge amount of complexity to 
the code.

It’s very tempting to cache a subset in Redis and call it a day — probably 
will. But, that’s not a very satisfying answer. It’s only about 5GB of data and 
it feels like I should be able to tune a Cassandra CF to be within 2x.

The workload is around 70% reads. Most of the writes are updates to existing 
data. Currently, it’s in an LCS CF with ~30M rows. The cluster is 300GB total 
with 3-way replication, running across 12 fairly large boxes with 16G RAM. All 
on SSDs. Striped across 3 AZs in AWS (hi1.4xlarges, fwiw).


Has anyone had success getting good results for this kind of workload? Or, is 
Cassandra just not suited for it at all and I should just use an in-memory 
store?

-Allan

Re: binary protocol server side sockets

2014-04-09 Thread DuyHai Doan
Hello Graham

 You can use the following code with the official Java driver:

 SocketOptions socketOptions = new SocketOptions();
 socketOptions.setKeepAlive(true);

 Cluster.builder().addContactPoints(contactPointsList)
.withPort(cql3Port)
.withCompression(ProtocolOptions.Compression.SNAPPY)
.withCredentials(cassandraUsername,
cassandraPassword)
.withSocketOptions(socketOptions)
.build();

 or :

 
alreadyBuiltClusterInstance.getConfiguration().getSocketOptions().setKeepAlive(true);


 Althought I'm not sure if the second alternative does work because the
cluster is already built and maybe the connection is already established...

 Regards

 Duy Hai DOAN


On Wed, Apr 9, 2014 at 12:59 AM, graham sanderson gra...@vast.com wrote:

 Is there a way to configure KEEPALIVE on the server end sockets of the
 binary protocol.

 rpc_keepalive only affects thrift.

 This is on 2.0.5

 Thanks,

 Graham


Re: Multiget performance

2014-04-09 Thread Daniel Chia
Are you making the 100 calls in serial, or in parallel?

Thanks,
Daniel


On Tue, Apr 8, 2014 at 11:22 PM, Allan C alla...@gmail.com wrote:

 Hi all,

 I've always been told that multigets are a Cassandra anti-pattern for
 performance reasons. I ran a quick test tonight to prove it to myself, and,
 sure enough, slowness ensued. It takes about 150ms to get 100 keys for my
 use case. Not terrible, but at least an order of magnitude from what I need
 it to be.

 So far, I've been able to denormalize and not have any problems. Today, I
 ran into a use case where denormalization introduces a huge amount of
 complexity to the code.

 It's very tempting to cache a subset in Redis and call it a day -- probably
 will. But, that's not a very satisfying answer. It's only about 5GB of data
 and it feels like I should be able to tune a Cassandra CF to be within 2x.

 The workload is around 70% reads. Most of the writes are updates to
 existing data. Currently, it's in an LCS CF with ~30M rows. The cluster is
 300GB total with 3-way replication, running across 12 fairly large boxes
 with 16G RAM. All on SSDs. Striped across 3 AZs in AWS (hi1.4xlarges, fwiw).


 Has anyone had success getting good results for this kind of workload? Or,
 is Cassandra just not suited for it at all and I should just use an
 in-memory store?

 -Allan



RE: Commit logs building up

2014-04-09 Thread Parag Patel
Nate,

What values for the FlushWriter line would draw concern to you?  What is the 
difference between Blocked and All Time Blocked?

Parag

From: Nate McCall [mailto:n...@thelastpickle.com]
Sent: Thursday, February 27, 2014 4:22 PM
To: Cassandra Users
Subject: Re: Commit logs building up

What was the impetus for turning up the commitlog_segment_size_in_mb?

Also, in nodetool tpstats, do what are the values for the FlushWriter line?

On Wed, Feb 26, 2014 at 12:18 PM, Christopher Wirt 
chris.w...@struq.commailto:chris.w...@struq.com wrote:
We're running 2.0.5, recently upgraded from 1.2.14.

Sometimes we are seeing CommitLogs starting to build up.

Is this a potential bug? Or a symptom of something else we can easily address?

We have
commitlog_sync: periodic
commitlog_sync_period_in_ms:1
commitlog_segment_size_in_mb: 512


Thanks,
Chris



--
-
Nate McCall
Austin, TX
@zznate

Co-Founder  Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Commitlog questions

2014-04-09 Thread Parag Patel


1)  Why is the default 4GB?  Has anyone changed this? What are some aspects 
to consider when determining the commitlog size?

2)  If the commitlog is in periodic mode, there is a property to set a time 
interval to flush the incoming mutations to disk.  This implies that there is a 
queue inside Cassandra to hold this data in memory until it is flushed.

a.   Is there a name for this queue?

b.  Is there a limit for this queue?

c.   Are there any tuning parameters for this queue?

Thanks,
Parag


[no subject]

2014-04-09 Thread Ben Hood
Hi all,

I'm getting the following error in a 2.0.6 instance:

ERROR [Native-Transport-Requests:16633] 2014-04-09 10:11:45,811
ErrorMessage.java (line 222) Unexpected exception during request
java.lang.AssertionError: localhost/127.0.0.1
at org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:860)
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:480)
at 
org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:524)
at 
org.apache.cassandra.cql3.statements.BatchStatement.executeWithoutConditions(BatchStatement.java:210)
at 
org.apache.cassandra.cql3.statements.BatchStatement.execute(BatchStatement.java:203)
at 
org.apache.cassandra.cql3.statements.BatchStatement.executeWithPerStatementVariables(BatchStatement.java:192)
at 
org.apache.cassandra.cql3.QueryProcessor.processBatch(QueryProcessor.java:373)
at 
org.apache.cassandra.transport.messages.BatchMessage.execute(BatchMessage.java:206)
at 
org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:304)
at 
org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43)
at 
org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Looking at the source for this, it appears to be related to a timeout:

// local write that time out should be handled by LocalMutationRunnable
assert !target.equals(FBUtilities.getBroadcastAddress()) : target;

Cursory testing indicates that this occurs during larger batch ingests.

But the error does not appear to be propagated properly back to the
client and it seems like this could be due to some misconfiguration.

Has anybody seen something like this before?

Cheers,

Ben


nodetool repair loops version 2.0.6

2014-04-09 Thread Kevin McLaughlin
Have a test cluster with three nodes each in two datacenters.  The
following causes nodetool repair to go into an (apparent) infinite
loop.  This is with 2.0.6.

On node 10.140.140.101:

cqlsh CREATE KEYSPACE looptest WITH replication = {

  ...   'class': 'NetworkTopologyStrategy',

   ...   '140': '2',

   ...   '141': '2'

   ... };

cqlsh use looptest;

cqlsh:looptest CREATE TABLE a_table (

...   id uuid,

...   description text,

...   PRIMARY KEY (id)

... );

cqlsh:looptest

On node 10.140.140.102:

[default@unknown] describe cluster;

Cluster Information:

   Name: Dev Cluster

   Snitch: org.apache.cassandra.locator.RackInferringSnitch

   Partitioner: org.apache.cassandra.dht.Murmur3Partitioner

   Schema versions:

e7c46d59-fceb-38b5-947c-dcbd14950a4c: [10.141.140.101, 10.140.140.101,
10.140.140.102, 10.141.140.103, 10.141.140.102, 10.140.140.103]

nodetool status:

Datacenter: 141

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns   Host ID
 Rack

UN  10.141.140.101  25.09 MB   256 15.6%
3f0d60bf-dfcd-42a9-9cff-8b76146359e3  140

UN  10.141.140.102  27.83 MB   256 16.7%
bbdcc640-278e-4d3d-ac12-fcb4d837d0e1  140

UN  10.141.140.103  23.78 MB   256 16.5%
b030e290-b8da-4883-a13d-b2529fab37fe  140

Datacenter: 140

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns   Host ID
 Rack

UN  10.140.140.103  65.26 MB   256 18.1%
52a9a718-2bed-4972-ab11-bd97a8d8539c  140

UN  10.140.140.101  69.46 MB   256 17.6%
d59300db-6179-484e-9ca1-8d1eada0701a  140

UN  10.140.140.102  68.08 MB   256 15.4%
22e504c9-1cc6-4744-b302-32bb5116d409  140


Back on 10.140.140.101:

nodetool repair looptest never returns.  Looking in the system.log,
it is continuously looping with:

INFO [AntiEntropySessions:818] 2014-04-09 13:23:31,889
RepairSession.java (line 282) [repair
#24b2b1b0-bfea-11e3-85a3-911072ba5322] session completed successfully

 INFO [AntiEntropySessions:816] 2014-04-09 13:23:31,916
RepairSession.java (line 244) [repair
#253687b0-bfea-11e3-85a3-911072ba5322] new session: will sync
/10.140.140.101, /10.141.140.103, /10.140.140.103, /10.141.140.102 on
range (-4377479664111251829,-4360027703686042340] for
looptest.[a_table]

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:31,949 RepairSession.java
(line 164) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Received
merkle tree for a_table from /10.141.140.102

 INFO [RepairJobTask:3] 2014-04-09 13:23:32,002 RepairJob.java (line
134) [repair #253687b0-bfea-11e3-85a3-911072ba5322] requesting merkle
trees for a_table (to [/10.141.140.103, /10.140.140.103,
/10.141.140.102, /10.140.140.101])

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,007 RepairSession.java
(line 164) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Received
merkle tree for a_table from /10.140.140.101

 INFO [RepairJobTask:3] 2014-04-09 13:23:32,012 Differencer.java (line
67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.141.140.101 and /10.140.140.103 are consistent for a_table

 INFO [RepairJobTask:2] 2014-04-09 13:23:32,016 Differencer.java (line
67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.141.140.101 and /10.140.140.101 are consistent for a_table

 INFO [RepairJobTask:1] 2014-04-09 13:23:32,016 Differencer.java (line
67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.141.140.101 and /10.141.140.102 are consistent for a_table

 INFO [RepairJobTask:4] 2014-04-09 13:23:32,016 Differencer.java (line
67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.140.140.103 and /10.141.140.102 are consistent for a_table

 INFO [RepairJobTask:5] 2014-04-09 13:23:32,016 Differencer.java (line
67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.140.140.103 and /10.140.140.101 are consistent for a_table

 INFO [RepairJobTask:6] 2014-04-09 13:23:32,016 Differencer.java (line
67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.141.140.102 and /10.140.140.101 are consistent for a_table

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,018 RepairSession.java
(line 221) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] a_table is
fully synced

 INFO [AntiEntropySessions:817] 2014-04-09 13:23:32,019
RepairSession.java (line 282) [repair
#24e867b0-bfea-11e3-85a3-911072ba5322] session completed successfully

 INFO [AntiEntropySessions:818] 2014-04-09 13:23:32,043
RepairSession.java (line 244) [repair
#2549c190-bfea-11e3-85a3-911072ba5322] new session: will sync
/10.140.140.101, /10.141.140.103, /10.140.140.102, /10.141.140.102 on
range (-3457228189350977014,-3443426249422196914] for
looptest.[a_table]

 INFO [RepairJobTask:3] 2014-04-09 13:23:32,169 RepairJob.java (line
134) [repair #2549c190-bfea-11e3-85a3-911072ba5322] requesting merkle
trees for a_table (to [/10.141.140.103, /10.140.140.102,
/10.141.140.102, 

Re: Apache cassandra not joining cluster ring

2014-04-09 Thread Joyabrata Das
Hello All,

Kindly help with below issues, I'm really stuck here.

Thanks,
Joy

On 8 April 2014 21:55, Joyabrata Das joy.luv.challen...@gmail.com wrote:

 Hello,

 I've a four node apache cassandra community 1.2 cluster in single
 datacenter with a seed.
 All configurations are similar in cassandra.yaml file.
 The following issues are faced, please help.

 1] Though fourth node isn't listed in nodetool ring or status command,
 system.log displayed only this node isn't communicating via gossip
 protoccol with other nodes.
 However both jmx  telnet port is enabled with proper listen/seed address
 configured.

 2] Though Opscenter is able to recognize all four nodes, the agents are
 not getting installed from opscenter.
 However same JVM version is installed as well as JAVA_HOME is also set in
 all four nodes.

 Further observed that problematic node has Ubuntu 64-Bit  other nodes are
 Ubuntu 32-Bit, can it be the reason?

 Thanks,
 Joy



Re: nodetool repair loops version 2.0.6

2014-04-09 Thread Kevin McLaughlin
In fact, it did eventually finish in ~20 minutes.  Is this duration
expected/normal?

--Kevin


On Wed, Apr 9, 2014 at 9:32 AM, Kevin McLaughlin kmcla...@gmail.com wrote:
 Have a test cluster with three nodes each in two datacenters.  The
 following causes nodetool repair to go into an (apparent) infinite
 loop.  This is with 2.0.6.

 On node 10.140.140.101:

 cqlsh CREATE KEYSPACE looptest WITH replication = {

   ...   'class': 'NetworkTopologyStrategy',

...   '140': '2',

...   '141': '2'

... };

 cqlsh use looptest;

 cqlsh:looptest CREATE TABLE a_table (

 ...   id uuid,

 ...   description text,

 ...   PRIMARY KEY (id)

 ... );

 cqlsh:looptest

 On node 10.140.140.102:

 [default@unknown] describe cluster;

 Cluster Information:

Name: Dev Cluster

Snitch: org.apache.cassandra.locator.RackInferringSnitch

Partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Schema versions:

 e7c46d59-fceb-38b5-947c-dcbd14950a4c: [10.141.140.101, 10.140.140.101,
 10.140.140.102, 10.141.140.103, 10.141.140.102, 10.140.140.103]

 nodetool status:

 Datacenter: 141

 ===

 Status=Up/Down

 |/ State=Normal/Leaving/Joining/Moving

 --  Address Load   Tokens  Owns   Host ID
  Rack

 UN  10.141.140.101  25.09 MB   256 15.6%
 3f0d60bf-dfcd-42a9-9cff-8b76146359e3  140

 UN  10.141.140.102  27.83 MB   256 16.7%
 bbdcc640-278e-4d3d-ac12-fcb4d837d0e1  140

 UN  10.141.140.103  23.78 MB   256 16.5%
 b030e290-b8da-4883-a13d-b2529fab37fe  140

 Datacenter: 140

 ===

 Status=Up/Down

 |/ State=Normal/Leaving/Joining/Moving

 --  Address Load   Tokens  Owns   Host ID
  Rack

 UN  10.140.140.103  65.26 MB   256 18.1%
 52a9a718-2bed-4972-ab11-bd97a8d8539c  140

 UN  10.140.140.101  69.46 MB   256 17.6%
 d59300db-6179-484e-9ca1-8d1eada0701a  140

 UN  10.140.140.102  68.08 MB   256 15.4%
 22e504c9-1cc6-4744-b302-32bb5116d409  140


 Back on 10.140.140.101:

 nodetool repair looptest never returns.  Looking in the system.log,
 it is continuously looping with:

 INFO [AntiEntropySessions:818] 2014-04-09 13:23:31,889
 RepairSession.java (line 282) [repair
 #24b2b1b0-bfea-11e3-85a3-911072ba5322] session completed successfully

  INFO [AntiEntropySessions:816] 2014-04-09 13:23:31,916
 RepairSession.java (line 244) [repair
 #253687b0-bfea-11e3-85a3-911072ba5322] new session: will sync
 /10.140.140.101, /10.141.140.103, /10.140.140.103, /10.141.140.102 on
 range (-4377479664111251829,-4360027703686042340] for
 looptest.[a_table]

  INFO [AntiEntropyStage:1] 2014-04-09 13:23:31,949 RepairSession.java
 (line 164) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Received
 merkle tree for a_table from /10.141.140.102

  INFO [RepairJobTask:3] 2014-04-09 13:23:32,002 RepairJob.java (line
 134) [repair #253687b0-bfea-11e3-85a3-911072ba5322] requesting merkle
 trees for a_table (to [/10.141.140.103, /10.140.140.103,
 /10.141.140.102, /10.140.140.101])

  INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,007 RepairSession.java
 (line 164) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Received
 merkle tree for a_table from /10.140.140.101

  INFO [RepairJobTask:3] 2014-04-09 13:23:32,012 Differencer.java (line
 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
 /10.141.140.101 and /10.140.140.103 are consistent for a_table

  INFO [RepairJobTask:2] 2014-04-09 13:23:32,016 Differencer.java (line
 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
 /10.141.140.101 and /10.140.140.101 are consistent for a_table

  INFO [RepairJobTask:1] 2014-04-09 13:23:32,016 Differencer.java (line
 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
 /10.141.140.101 and /10.141.140.102 are consistent for a_table

  INFO [RepairJobTask:4] 2014-04-09 13:23:32,016 Differencer.java (line
 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
 /10.140.140.103 and /10.141.140.102 are consistent for a_table

  INFO [RepairJobTask:5] 2014-04-09 13:23:32,016 Differencer.java (line
 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
 /10.140.140.103 and /10.140.140.101 are consistent for a_table

  INFO [RepairJobTask:6] 2014-04-09 13:23:32,016 Differencer.java (line
 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
 /10.141.140.102 and /10.140.140.101 are consistent for a_table

  INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,018 RepairSession.java
 (line 221) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] a_table is
 fully synced

  INFO [AntiEntropySessions:817] 2014-04-09 13:23:32,019
 RepairSession.java (line 282) [repair
 #24e867b0-bfea-11e3-85a3-911072ba5322] session completed successfully

  INFO [AntiEntropySessions:818] 2014-04-09 13:23:32,043
 RepairSession.java (line 244) [repair
 #2549c190-bfea-11e3-85a3-911072ba5322] new session: will sync
 /10.140.140.101, /10.141.140.103, /10.140.140.102, /10.141.140.102 on
 range 

Re: Apache cassandra not joining cluster ring

2014-04-09 Thread Jonathan Lacefield
Hello

   The nodetool status that you mentioned, was that executed on the 4th
node itself?   Also What does netstat display?  Are the correct ports
listening on that node?

   Per opscenter, What version of opscenter are you using?  Are you able to
manually start the agents on the nodes themselves?

On Apr 9, 2014, at 6:57 AM, Joyabrata Das joy.luv.challen...@gmail.com
wrote:

Hello All,

Kindly help with below issues, I'm really stuck here.

Thanks,
Joy

On 8 April 2014 21:55, Joyabrata Das joy.luv.challen...@gmail.com wrote:

 Hello,

 I've a four node apache cassandra community 1.2 cluster in single
 datacenter with a seed.
 All configurations are similar in cassandra.yaml file.
 The following issues are faced, please help.

 1] Though fourth node isn't listed in nodetool ring or status command,
 system.log displayed only this node isn't communicating via gossip
 protoccol with other nodes.
 However both jmx  telnet port is enabled with proper listen/seed address
 configured.

 2] Though Opscenter is able to recognize all four nodes, the agents are
 not getting installed from opscenter.
 However same JVM version is installed as well as JAVA_HOME is also set in
 all four nodes.

 Further observed that problematic node has Ubuntu 64-Bit  other nodes are
 Ubuntu 32-Bit, can it be the reason?

 Thanks,
 Joy



Re: Apache cassandra not joining cluster ring

2014-04-09 Thread Michael Shuler
As Jonathan also asked for some various details, perhaps it would be 
helpful to be very specific about who, what, when, where, why, what you 
tried, actual errors, versions, pastebins of configs, etc. Provide the 
things that might be needed for people to help you out.


For instance, the statement that All configurations are similar in 
cassandra.yaml means nothing to anyone on the list, if they can't see 
them to tell you, oh, here on line XX, you have blah, and it should be 
blarg.


--
Kind regards,
Michael

On 04/09/2014 08:56 AM, Joyabrata Das wrote:

Hello All,

Kindly help with below issues, I'm really stuck here.

Thanks,
Joy

On 8 April 2014 21:55, Joyabrata Das joy.luv.challen...@gmail.com
mailto:joy.luv.challen...@gmail.com wrote:

Hello,

I've a four node apache cassandra community 1.2 cluster in single
datacenter with a seed.
All configurations are similar in cassandra.yaml file.
The following issues are faced, please help.

1] Though fourth node isn't listed in nodetool ring or status
command, system.log displayed only this node isn't communicating via
gossip protoccol with other nodes.
However both jmx  telnet port is enabled with proper listen/seed
address configured.

2] Though Opscenter is able to recognize all four nodes, the agents
are not getting installed from opscenter.
However same JVM version is installed as well as JAVA_HOME is also
set in all four nodes.

Further observed that problematic node has Ubuntu 64-Bit  other
nodes are Ubuntu 32-Bit, can it be the reason?

Thanks,
Joy






Re: Commitlog questions

2014-04-09 Thread Oleg Dulin

Parag:

To answer your questions:

1) Default is just that, a default. I wouldn't advise raising it 
though. The bigger it is the longer it takes to restart the node.
2) I think they juse use fsync. There is no queue. All files in 
cassandra use java.nio buffers, but they need to be fsynced 
periodically. Look at commitlog_sync parameters in cassandra.yaml file, 
the comments there explain how it works. I believe the difference 
between periodic and batch is just that -- if it is periodic, it will 
fsync every 10 seconds, if it is batch it will fsync if there were any 
changes within a time window.


On 2014-04-09 10:06:52 +, Parag Patel said:


 
1)  Why is the default 4GB?  Has anyone changed this? What are some 
aspects to consider when determining the commitlog size?
2)  If the commitlog is in periodic mode, there is a property to 
set a time interval to flush the incoming mutations to disk.  This 
implies that there is a queue inside Cassandra to hold this data in 
memory until it is flushed.

a.   Is there a name for this queue?
b.  Is there a limit for this queue?
c.   Are there any tuning parameters for this queue?

 
Thanks,
Parag



--
Regards,
Oleg Dulin
http://www.olegdulin.com




Re: Apache cassandra not joining cluster ring

2014-04-09 Thread Michael Shuler

On 04/08/2014 11:25 AM, Joyabrata Das wrote:

Further observed that problematic node has Ubuntu 64-Bit  other nodes
are Ubuntu 32-Bit, can it be the reason?


This may not be recommended, might/should(?) work, and may be a reason 
[0]. My first suggestion would be to remove this variable. This would 
also give you a chance to go through the steps of adding a new node to 
the cluster again [1] - you might stumble on something.


[0] 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/can-I-have-a-mix-of-32-and-64-bit-machines-in-a-cluster-td7583051.html
[1] 
http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_add_node_to_cluster_t.html


--
Michael


Re: Multiget performance

2014-04-09 Thread Allan C
As one CQL statement:

 SELECT * from Event WHERE key IN ([100 keys]);

-Allan

On April 9, 2014 at 12:52:13 AM, Daniel Chia (danc...@coursera.org) wrote:

Are you making the 100 calls in serial, or in parallel?

Thanks,
Daniel


On Tue, Apr 8, 2014 at 11:22 PM, Allan C alla...@gmail.com wrote:
Hi all,

I’ve always been told that multigets are a Cassandra anti-pattern for 
performance reasons. I ran a quick test tonight to prove it to myself, and, 
sure enough, slowness ensued. It takes about 150ms to get 100 keys for my use 
case. Not terrible, but at least an order of magnitude from what I need it to 
be.

So far, I’ve been able to denormalize and not have any problems. Today, I ran 
into a use case where denormalization introduces a huge amount of complexity to 
the code.

It’s very tempting to cache a subset in Redis and call it a day — probably 
will. But, that’s not a very satisfying answer. It’s only about 5GB of data and 
it feels like I should be able to tune a Cassandra CF to be within 2x.

The workload is around 70% reads. Most of the writes are updates to existing 
data. Currently, it’s in an LCS CF with ~30M rows. The cluster is 300GB total 
with 3-way replication, running across 12 fairly large boxes with 16G RAM. All 
on SSDs. Striped across 3 AZs in AWS (hi1.4xlarges, fwiw).


Has anyone had success getting good results for this kind of workload? Or, is 
Cassandra just not suited for it at all and I should just use an in-memory 
store?

-Allan



Re: binary protocol server side sockets

2014-04-09 Thread graham sanderson
Thanks, but I would think that just sets keep alive from the client end; I’m 
talking about the server end… this is one of those issues where there is 
something (e.g. switch, firewall, VPN in between the client and the server) and 
we get left with orphaned established connections to the server when the client 
is gone.

On Apr 9, 2014, at 2:48 AM, DuyHai Doan doanduy...@gmail.com wrote:

 Hello Graham
 
  You can use the following code with the official Java driver:
 
  SocketOptions socketOptions = new SocketOptions();
  socketOptions.setKeepAlive(true);
 
  Cluster.builder().addContactPoints(contactPointsList)
 .withPort(cql3Port)
 .withCompression(ProtocolOptions.Compression.SNAPPY)
 .withCredentials(cassandraUsername, cassandraPassword)
 .withSocketOptions(socketOptions)
 .build();
 
  or :
 
  
 alreadyBuiltClusterInstance.getConfiguration().getSocketOptions().setKeepAlive(true);
 
 
  Althought I'm not sure if the second alternative does work because the 
 cluster is already built and maybe the connection is already established...
 
  Regards
 
  Duy Hai DOAN
 
 
 On Wed, Apr 9, 2014 at 12:59 AM, graham sanderson gra...@vast.com wrote:
 Is there a way to configure KEEPALIVE on the server end sockets of the binary 
 protocol.
 
 rpc_keepalive only affects thrift.
 
 This is on 2.0.5
 
 Thanks,
 
 Graham
 



smime.p7s
Description: S/MIME cryptographic signature


Re: binary protocol server side sockets

2014-04-09 Thread Michael Shuler

On 04/09/2014 11:39 AM, graham sanderson wrote:

Thanks, but I would think that just sets keep alive from the client end;
I’m talking about the server end… this is one of those issues where
there is something (e.g. switch, firewall, VPN in between the client and
the server) and we get left with orphaned established connections to the
server when the client is gone.


There would be no server setting for any service, not just c*, that 
would correct mis-configured connection-assassinating network gear 
between the client and server. Fix the gear to allow persistent connections.


Digging through the various timeouts in c*.yaml didn't lead me to a 
simple answer for something tunable, but I think this may be more basic 
networking related. I believe it's up to the client to keep the 
connection open as Duy indicated. I don't think c* will arbitrarily 
sever connections - something that disconnects the client may happen. In 
that case, the TCP connection on the server should drop to TIME_WAIT. Is 
this what you are seeing in `netstat -a` on the server - a bunch of 
TIME_WAIT connections hanging around? Those should eventually be 
recycled, but that's tunable in the network stack, if they are being 
generated at a high rate.


--
Michael


Re: binary protocol server side sockets

2014-04-09 Thread graham sanderson
Michael, it is not that the connections are being dropped, it is that the 
connections are not being dropped.

These server side sockets are ESTABLISHED, even though the client connection on 
the other side of the network device is long gone. This may well be an issue 
with the network device (it is valiantly trying to keep the connection alive it 
seems).

That said KEEPALIVE on the server side would not be a bad idea. At least then 
the OS on the server would eventually (probably after 2 hours of inactivity) 
attempt to ping the client. At that point hopefully something interesting would 
happen perhaps causing an error and destroying the server side socket (note 
KEEPALIVE is also good for preventing idle connections from being dropped by 
other network devices along the way)

rpc_keepalive on the server sets keep alive on the server side sockets for 
thrift, and is true by default

There doesn’t seem to be a setting for the native protocol

Note this isn’t a huge issue for us, they can be cleaned up by a rolling 
restart, and this particular case is not production, but related to 
development/testing against alpha by people working remotely over VPN - and it 
may well be the VPNs fault in this case… that said and maybe this is a dev list 
question, it seems like the option to set keepalive should exist.

On Apr 9, 2014, at 12:25 PM, Michael Shuler mich...@pbandjelly.org wrote:

 On 04/09/2014 11:39 AM, graham sanderson wrote:
 Thanks, but I would think that just sets keep alive from the client end;
 I’m talking about the server end… this is one of those issues where
 there is something (e.g. switch, firewall, VPN in between the client and
 the server) and we get left with orphaned established connections to the
 server when the client is gone.
 
 There would be no server setting for any service, not just c*, that would 
 correct mis-configured connection-assassinating network gear between the 
 client and server. Fix the gear to allow persistent connections.
 
 Digging through the various timeouts in c*.yaml didn't lead me to a simple 
 answer for something tunable, but I think this may be more basic networking 
 related. I believe it's up to the client to keep the connection open as Duy 
 indicated. I don't think c* will arbitrarily sever connections - something 
 that disconnects the client may happen. In that case, the TCP connection on 
 the server should drop to TIME_WAIT. Is this what you are seeing in `netstat 
 -a` on the server - a bunch of TIME_WAIT connections hanging around? Those 
 should eventually be recycled, but that's tunable in the network stack, if 
 they are being generated at a high rate.
 
 -- 
 Michael



smime.p7s
Description: S/MIME cryptographic signature


Re: Commitlog questions

2014-04-09 Thread Robert Coli
On Wed, Apr 9, 2014 at 3:06 AM, Parag Patel ppa...@clearpoolgroup.comwrote:

   some questions about the commitlog and related assumptions


https://issues.apache.org/jira/browse/CASSANDRA-6764

You might wish to get in contact with the reporter here, who has similar
questions!

=Rob


Re: Commit logs building up

2014-04-09 Thread Robert Coli
On Wed, Apr 9, 2014 at 3:06 AM, Parag Patel ppa...@clearpoolgroup.comwrote:

 What values for the FlushWriter line would draw concern to you?  What is
 the difference between Blocked and All Time Blocked?


Non-zero all time blocked. Because if the FlushWriter is blocked, you
probably don't have enough io to flush quickly enough.

Blocked is currently blocked, all time blocked is blocked since node
startup.

=Rob


Re: nodetool repair loops version 2.0.6

2014-04-09 Thread Robert Coli
On Wed, Apr 9, 2014 at 7:09 AM, Kevin McLaughlin kmcla...@gmail.com wrote:

 In fact, it did eventually finish in ~20 minutes.  Is this duration
 expected/normal?


https://issues.apache.org/jira/browse/CASSANDRA-5220

=Rob


Re: binary protocol server side sockets

2014-04-09 Thread Michael Shuler

On 04/09/2014 12:41 PM, graham sanderson wrote:

Michael, it is not that the connections are being dropped, it is that
the connections are not being dropped.


Thanks for the clarification.


These server side sockets are ESTABLISHED, even though the client
connection on the other side of the network device is long gone. This
may well be an issue with the network device (it is valiantly trying
to keep the connection alive it seems).


Have you tested if they *ever* time out on their own, or do they just 
keep sticking around forever? (maybe 432000 sec (120 hours), which is 
the default for nf_conntrack_tcp_timeout_established?) Trying out all 
the usage scenarios is really the way to track it down - directly on 
switch, behind/in front of firewall, on/off the VPN.



That said KEEPALIVE on the server side would not be a bad idea. At
least then the OS on the server would eventually (probably after 2
hours of inactivity) attempt to ping the client. At that point
hopefully something interesting would happen perhaps causing an error
and destroying the server side socket (note KEEPALIVE is also good
for preventing idle connections from being dropped by other network
devices along the way)


Tuning net.ipv4.tcp_keepalive_* could be helpful, if you know they 
timeout after 2 hours, which is the default.



rpc_keepalive on the server sets keep alive on the server side
sockets for thrift, and is true by default

There doesn’t seem to be a setting for the native protocol

Note this isn’t a huge issue for us, they can be cleaned up by a
rolling restart, and this particular case is not production, but
related to development/testing against alpha by people working
remotely over VPN - and it may well be the VPNs fault in this case…
that said and maybe this is a dev list question, it seems like the
option to set keepalive should exist.


Yeah, but I agree you shouldn't have to restart to clean up connections 
- that's why I think it is lower in the network stack, and that a bit of 
troubleshooting and tuning might be helpful. That setting sounds like a 
good Jira request - keepalive may be the default, I'm not sure. :)


--
Michael


On Apr 9, 2014, at 12:25 PM, Michael Shuler mich...@pbandjelly.org
wrote:


On 04/09/2014 11:39 AM, graham sanderson wrote:

Thanks, but I would think that just sets keep alive from the
client end; I’m talking about the server end… this is one of
those issues where there is something (e.g. switch, firewall, VPN
in between the client and the server) and we get left with
orphaned established connections to the server when the client is
gone.


There would be no server setting for any service, not just c*, that
would correct mis-configured connection-assassinating network gear
between the client and server. Fix the gear to allow persistent
connections.

Digging through the various timeouts in c*.yaml didn't lead me to a
simple answer for something tunable, but I think this may be more
basic networking related. I believe it's up to the client to keep
the connection open as Duy indicated. I don't think c* will
arbitrarily sever connections - something that disconnects the
client may happen. In that case, the TCP connection on the server
should drop to TIME_WAIT. Is this what you are seeing in `netstat
-a` on the server - a bunch of TIME_WAIT connections hanging
around? Those should eventually be recycled, but that's tunable in
the network stack, if they are being generated at a high rate.

-- Michael






Update SSTable fragmentation

2014-04-09 Thread Wayne Schroeder
I've been doing a lot of reading on SSTable fragmentation due to updates and 
the costs associated with reconstructing the end data from multiple SSTables 
that have been created over time and not yet compacted.  One question is stuck 
in my head: If you re-insert entire rows instead of updating one column, will 
cassandra end flushing that entire row into one SSTable on disk and then end up 
up finding a non fragmented entire row quickly on reads instead of potential 
reconstruction across multiple SSTables?  Obviously this has implications for 
space as a trade off.

Wayne



Per-keyspace partitioners?

2014-04-09 Thread Clint Kelly
Hi everyone,

Is there a way to change the partitioner on a per-table or per-keyspace
basis?

We have some tables for which we'd like to enable ordered scans of rows, so
we'd like to use the ByteOrdered partitioner for those, but use Murmur3 for
everything else in our cluster.

Is this possible?  Or does the partitioner have to be the same for the
entire cluster?

Best regards,
Clint


Re: Per-keyspace partitioners?

2014-04-09 Thread Jonathan Lacefield
Hello,

  Partitioner is per cluster.  We have seen users create separate clusters
for items like this, but that's an edge case.

Jonathan

Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487
http://www.linkedin.com/in/jlacefield

http://www.datastax.com/cassandrasummit14



On Wed, Apr 9, 2014 at 11:57 AM, Clint Kelly clint.ke...@gmail.com wrote:

 Hi everyone,

 Is there a way to change the partitioner on a per-table or per-keyspace
 basis?

 We have some tables for which we'd like to enable ordered scans of rows,
 so we'd like to use the ByteOrdered partitioner for those, but use Murmur3
 for everything else in our cluster.

 Is this possible?  Or does the partitioner have to be the same for the
 entire cluster?

 Best regards,
 Clint



Re: binary protocol server side sockets

2014-04-09 Thread graham sanderson
Thanks Michael,

Yup keepalive is not the default. It is possible they are going away after 
nf_conntrack_tcp_timeout_established; will have to do more digging (it is hard 
to tell how old a connection is - there are no visible timers (thru netstat) on 
an ESTABLISHED connection))…

This is actually low on my priority list, I was just spending a bit of time 
trying to track down the source of 

ERROR [Native-Transport-Requests:3833603] 2014-04-09 17:46:48,833 
ErrorMessage.java (line 222) Unexpected exception during request
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

errors, which are spamming our server logs quite a lot (I originally thought 
this might be caused by KEEPALIVE, which is when I realized that the 
connections weren’t in keep alive and were building up) - it would be nice if 
netty would tell us which a little about the Socket channel in the error 
message (maybe there is a way to do this by changing log levels, but as I say I 
haven’t had time to go digging there)

I will probably file a JIRA issue to add the setting (since I can’t see any 
particular harm to setting keepalive)

On Apr 9, 2014, at 1:34 PM, Michael Shuler mich...@pbandjelly.org wrote:

 On 04/09/2014 12:41 PM, graham sanderson wrote:
 Michael, it is not that the connections are being dropped, it is that
 the connections are not being dropped.
 
 Thanks for the clarification.
 
 These server side sockets are ESTABLISHED, even though the client
 connection on the other side of the network device is long gone. This
 may well be an issue with the network device (it is valiantly trying
 to keep the connection alive it seems).
 
 Have you tested if they *ever* time out on their own, or do they just keep 
 sticking around forever? (maybe 432000 sec (120 hours), which is the default 
 for nf_conntrack_tcp_timeout_established?) Trying out all the usage scenarios 
 is really the way to track it down - directly on switch, behind/in front of 
 firewall, on/off the VPN.
 
 That said KEEPALIVE on the server side would not be a bad idea. At
 least then the OS on the server would eventually (probably after 2
 hours of inactivity) attempt to ping the client. At that point
 hopefully something interesting would happen perhaps causing an error
 and destroying the server side socket (note KEEPALIVE is also good
 for preventing idle connections from being dropped by other network
 devices along the way)
 
 Tuning net.ipv4.tcp_keepalive_* could be helpful, if you know they timeout 
 after 2 hours, which is the default.
 
 rpc_keepalive on the server sets keep alive on the server side
 sockets for thrift, and is true by default
 
 There doesn’t seem to be a setting for the native protocol
 
 Note this isn’t a huge issue for us, they can be cleaned up by a
 rolling restart, and this particular case is not production, but
 related to development/testing against alpha by people working
 remotely over VPN - and it may well be the VPNs fault in this case…
 that said and maybe this is a dev list question, it seems like the
 option to set keepalive should exist.
 
 Yeah, but I agree you shouldn't have to restart to clean up connections - 
 that's why I think it is lower in the network stack, and that a bit of 
 troubleshooting and tuning might be helpful. That setting sounds like a good 
 Jira request - keepalive may be the default, I'm not sure. :)
 
 -- 
 Michael
 
 On Apr 9, 2014, at 12:25 PM, Michael Shuler mich...@pbandjelly.org
 wrote:
 
 On 04/09/2014 11:39 AM, graham sanderson wrote:
 Thanks, but I would think that just sets keep alive from the
 client end; I’m talking about the server end… this is one of
 those issues where there is something (e.g. switch, firewall, VPN
 in between the client and the server) and we get left with
 orphaned established connections to the server when the client is
 gone.
 
 There would be no server setting for any service, not just c*, that
 would correct mis-configured connection-assassinating network gear
 between 

Re: Update SSTable fragmentation

2014-04-09 Thread Ken Hancock
I don't believe so.  Cassandra still needs to hit the bloom filters for
each SST table and then reconcile all versions and all tombstones for any
row.  That's why overwrites have similar performance impact as tombstones,
overwrites just happen to be less common.



On Wed, Apr 9, 2014 at 2:42 PM, Wayne Schroeder 
wschroe...@pinsightmedia.com wrote:

 I've been doing a lot of reading on SSTable fragmentation due to updates
 and the costs associated with reconstructing the end data from multiple
 SSTables that have been created over time and not yet compacted.  One
 question is stuck in my head: If you re-insert entire rows instead of
 updating one column, will cassandra end flushing that entire row into one
 SSTable on disk and then end up up finding a non fragmented entire row
 quickly on reads instead of potential reconstruction across multiple
 SSTables?  Obviously this has implications for space as a trade off.

 Wayne




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com |
NASDAQ:SEAChttp://www.schange.com/en-US/Company/InvestorRelations.aspx

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
LinkedIn]http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.


Re: Multiget performance

2014-04-09 Thread Tyler Hobbs
Can you trace the query and paste the results?


On Wed, Apr 9, 2014 at 11:17 AM, Allan C alla...@gmail.com wrote:

 As one CQL statement:

  SELECT * from Event WHERE key IN ([100 keys]);

 -Allan

 On April 9, 2014 at 12:52:13 AM, Daniel Chia (danc...@coursera.org) wrote:

 Are you making the 100 calls in serial, or in parallel?

 Thanks,
 Daniel


 On Tue, Apr 8, 2014 at 11:22 PM, Allan C alla...@gmail.com wrote:

  Hi all,

  I've always been told that multigets are a Cassandra anti-pattern for
 performance reasons. I ran a quick test tonight to prove it to myself, and,
 sure enough, slowness ensued. It takes about 150ms to get 100 keys for my
 use case. Not terrible, but at least an order of magnitude from what I need
 it to be.

  So far, I've been able to denormalize and not have any problems. Today,
 I ran into a use case where denormalization introduces a huge amount of
 complexity to the code.

  It's very tempting to cache a subset in Redis and call it a day --
 probably will. But, that's not a very satisfying answer. It's only about
 5GB of data and it feels like I should be able to tune a Cassandra CF to be
 within 2x.

  The workload is around 70% reads. Most of the writes are updates to
 existing data. Currently, it's in an LCS CF with ~30M rows. The cluster is
 300GB total with 3-way replication, running across 12 fairly large boxes
 with 16G RAM. All on SSDs. Striped across 3 AZs in AWS (hi1.4xlarges, fwiw).


 Has anyone had success getting good results for this kind of workload?
 Or, is Cassandra just not suited for it at all and I should just use an
 in-memory store?

  -Allan





-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Upgrading Cassandra

2014-04-09 Thread Tyler Hobbs
On Tue, Apr 8, 2014 at 4:39 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:


 Yet, can't we rebuild a new DC with the current C* version, upgrade it to
 the new major once it is fully part of the C* cluster, and then switch all
 the clients to the new DC once we are sure everything is ok and shut down
 the old one ?


Yes



 I mean, on a multiDC setup, while upgrading, there must be a moment that 2
 DCs haven't the same major version, this is probably supported.


It is supported, you just don't want to do add/remove nodes, run repairs,
etc with a mixed cluster.


-- 
Tyler Hobbs
DataStax http://datastax.com/


How to replace cluster name without any impact?

2014-04-09 Thread Check Peck
We have around 36 node Cassandra cluster and we have three Datacenters.
Each datacenter have 12 node.

We already have data flowing in Cassandra now and we cannot wipe out all
our data now.

Considering this - what is the right way to rename the cluster name without
any or minimal impact?


Re: How to replace cluster name without any impact?

2014-04-09 Thread Mark Reddy
What version are you running? As of 1.2.x you can do the following:

1. Start the cqlsh connected locally to the node.
2. Run:
update system.local set cluster_name='$CLUSTER_NAME' where key='local';
3. Run nodetool flush on the node.
4. Update the cassandra.yaml file on the node, changing the cluster_name to
the same as you set in step 2.
5. Restart the node.


Please be aware that you will have two partial clusters until you complete
your rolling restart. Also considering that the cluster name is only a
cosmetic value my opinion would be to leave it, as the risk far outweighs
the benefits of changing it.


Mark


On Thu, Apr 10, 2014 at 2:49 AM, Check Peck comptechge...@gmail.com wrote:

 We have around 36 node Cassandra cluster and we have three Datacenters.
 Each datacenter have 12 node.

 We already have data flowing in Cassandra now and we cannot wipe out all
 our data now.

 Considering this - what is the right way to rename the cluster name
 without any or minimal impact?