ETA for Cassandra 2.1 final release

2014-09-08 Thread Eugene Voytitsky

Hi all,

is there preliminary date when Cassandra 2.1 be finally released (not 
beta/rc)?


--
Best regards,
Eugene Voytitsky


Re: ETA for Cassandra 2.1 final release

2014-09-08 Thread Benedict Elliott Smith
It's up for vote right now, so should be a just few days unless something
unexpected happens.

On Mon, Sep 8, 2014 at 4:46 PM, Eugene Voytitsky viy@gmail.com wrote:

 Hi all,

 is there preliminary date when Cassandra 2.1 be finally released (not
 beta/rc)?

 --
 Best regards,
 Eugene Voytitsky



Cassandra 2.0.5 : *-jb-27-Data.db (No such file or directory)

2014-09-08 Thread Shing Hing Man
Hi,
   I am running Cassandra 2.0.5 on my PC (with just one node and the default 
cassandra.yaml). 

I have inserted  one million rows into a column family (each row has a int key, 
two  small setstring columns.) 

In cqlsh,  when I did a select count

 cqlsh:testks select count(*) from ips_table limit 200;

I got the following exception :
 
ERROR 16:04:05,657 Exception in thread Thread[ReadStage:66,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: 
java.io.FileNotFoundException: 
/home/shing/installed/cassandras/filelogs/data/testks/ips_table/testks-ips_table-jb-27-Data.db
 (No such file or directory)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1935)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
/home/shing/installed/cassandras/filelogs/data/testks/ips_table/testks-ips_table-jb-27-Data.db
 (No such file or directory)
at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:59)
at 
org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1362)
at org.apache.cassandra.io.sstable.SSTableScanner.init(SSTableScanner.java:67)
at 
org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1147)
at 
org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:69)
at 
org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator(ColumnFamilyStore.java:1599)
at 
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1718)
at 
org.apache.cassandra.db.PagedRangeCommand.executeLocally(PagedRangeCommand.java:111)
at 
org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1418)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1931)
... 3 more


Are  there some Cassandra parameters I could set to get ride of the above 
exception ?

Thanks in advance for any assistance !

Shing

Re: Questions about cleaning up/purging Hinted Handoffs

2014-09-08 Thread Robert Coli
On Fri, Sep 5, 2014 at 3:20 PM, Rahul Neelakantan ra...@rahul.be wrote:

 The reason I asked about he hints is because I see hints being replayed
 but the large compacted hints stable still sticks around, perhaps it is a
 bug with that version .


I've seen this behavior with HH in older versions, so probably.

=Rob


Failed to enable shuffling error

2014-09-08 Thread Tim Heckman
Hello,

I'm looking to convert our recently upgraded Cassandra cluster from a
single token per node to using vnodes. We've determined that based on
our data consistency and usage patterns that shuffling will be the
best way to convert our live cluster.

However, when following the instructions for doing the shuffle, we
aren't able to enable shuffling on the other 4 nodes in the cluster.
We get the error message 'Failed to enable shuffling', which looks to
be a generic string printed when a JMX IOException is caught.
Unfortunately, the underlying error is not printed so I'm effectively
troubleshooting in the dark.

I've done some mailing list diving, as well as Google skimming, and
all the suggestions did not seem to work.

I've confirmed that a firewall is not the cause as I am able to
establish a TCP socket (using telnet) from one node to the other. I've
also double-checked the JMX-specific settings that are being set for
Cassandra and those look good. I'm going with the most open settings
now to try and get this working:

-Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false

I also tried playing with the 'java.rmi.server.hostname' setting, but
none of the options set seemed to make a difference (hostname, fqdn,
public IPv4 address, private EC2 address).

Without any further information from the 'cassandra-shuffle' utility
I'm pretty much out of ideas. Any suggestions would be greatly
appreciated!

Cheers!
-Tim


Moving Cassandra from EC2 Classic into VPC

2014-09-08 Thread Oleg Dulin

Dear Colleagues:

I need to move Cassandra from EC2 classic into VPC.

What I was thinking is that I can create a new data center within VPC 
and rebuild it from my existing one (switching to vnodes while I am at 
it). However, I don't understand how the ec2-snitch will deal with this.


Another idea I had was taking the ec2-snitch configuration and 
converting it into a Property file snitch. But I still don't understand 
how to perform this move since I need my newly created VPC instances to 
have public IPs -- something I would like to avoid.


Any thoughts are appreciated.

Regards,
Oleg




Re: Moving Cassandra from EC2 Classic into VPC

2014-09-08 Thread Bram Avontuur
I have setup Cassandra into VPC with the EC2Snitch and it works without
issues. I didn't need to do anything special to the configuration. I have
created instances in 2 availability zones, and it automatically
picks it up as 2 different data racks. Just make sure your nodes can see
each other in the VPC, e.g. setup a security group that allows connections
from other nodes from the same group.

There should be no need to use public IP's if whatever talks to cassandra
is also within your VPC.

Hope this helps.
Bram


On Mon, Sep 8, 2014 at 3:34 PM, Oleg Dulin oleg.du...@gmail.com wrote:

 Dear Colleagues:

 I need to move Cassandra from EC2 classic into VPC.

 What I was thinking is that I can create a new data center within VPC and
 rebuild it from my existing one (switching to vnodes while I am at it).
 However, I don't understand how the ec2-snitch will deal with this.

 Another idea I had was taking the ec2-snitch configuration and converting
 it into a Property file snitch. But I still don't understand how to perform
 this move since I need my newly created VPC instances to have public IPs --
 something I would like to avoid.

 Any thoughts are appreciated.

 Regards,
 Oleg





Re: Moving Cassandra from EC2 Classic into VPC

2014-09-08 Thread Oleg Dulin
I get that, but if you read my opening post, I have an existing cluster 
in EC2 classic that I have no idea how to move to VPC cleanly.



On 2014-09-08 19:52:28 +, Bram Avontuur said:

I have setup Cassandra into VPC with the EC2Snitch and it works without 
issues. I didn't need to do anything special to the configuration. I 
have created instances in 2 availability zones, and it automatically
picks it up as 2 different data racks. Just make sure your nodes can 
see each other in the VPC, e.g. setup a security group that allows 
connections from other nodes from the same group.


There should be no need to use public IP's if whatever talks to 
cassandra is also within your VPC.


Hope this helps.
Bram


On Mon, Sep 8, 2014 at 3:34 PM, Oleg Dulin oleg.du...@gmail.com wrote:
Dear Colleagues:

I need to move Cassandra from EC2 classic into VPC.

What I was thinking is that I can create a new data center within VPC 
and rebuild it from my existing one (switching to vnodes while I am at 
it). However, I don't understand how the ec2-snitch will deal with this.


Another idea I had was taking the ec2-snitch configuration and 
converting it into a Property file snitch. But I still don't understand 
how to perform this move since I need my newly created VPC instances to 
have public IPs -- something I would like to avoid.


Any thoughts are appreciated.

Regards,
Oleg





Re: Failed to enable shuffling error

2014-09-08 Thread Tim Heckman
On Mon, Sep 8, 2014 at 11:19 AM, Robert Coli rc...@eventbrite.com wrote:
 On Mon, Sep 8, 2014 at 11:08 AM, Tim Heckman t...@pagerduty.com wrote:

 I'm looking to convert our recently upgraded Cassandra cluster from a
 single token per node to using vnodes. We've determined that based on
 our data consistency and usage patterns that shuffling will be the
 best way to convert our live cluster.


 You apparently haven't read anything else about shuffling, or you would have
 learned that no one has ever successfully done it in a real production
 cluster. ;)

I've definitely seen the horror stories that have come out of shuffle.
:) We plan on giving this a trial run on production-sized data before
actually doing it on our production hardware.


 Unfortunately, the underlying error is not printed so I'm effectively
 troubleshooting in the dark.


 This mysterious error is protecting you from a probably quite negative
 experience with shuffle.

We're still at the exploratory stage on systems that are not
production-facing but contain production-like data. Based on our
placement strategy we have some concerns that the new datacenter
approach may be riskier or more difficult. We're just trying to gauge
both paths and see what works best for us.


 I've done some mailing list diving, as well as Google skimming, and
 all the suggestions did not seem to work.


 What version of Cassandra are you running? I would not be surprised if
 shuffle is in fact completely broken in 2.0.x release, not only hazardous to
 attempt.

 Why do you believe that you want to shuffle and/or enable vnodes? How large
 is the cluster and how large is it likely to become?

We're still back on the 1.2 version of Cass, specifically 1.2.16 for
the majority of our clusters with one cluster having seen its
inception after the 1.2.18 release.

The cluster I'm testing this on is a 5 node cluster with a placement
strategy such that all nodes contain 100% of the data. In practice we
have six clusters of similar size that are used for different
services. These different clusters may need additional capacity at
different times, so it's hard to answer the maximum size question. For
now let's just assume that the clusters may never see an 11th
member... but no guarantees.

We're looking to use vnodes to help with easing the administrative
work of scaling out the cluster. The improvements of streaming data
during repairs amongst others.

For shuffle, it looks like it may be easier than adding a new
datacenter and then have to adjust the schema for a new datacenter
to come to life. And we weren't sure whether the same pitfalls of
shuffle would effect us while having all data on all nodes.

 =Rob


Thanks for the quick reply, Rob.

-Tim


Re: Failed to enable shuffling error

2014-09-08 Thread Robert Coli
On Mon, Sep 8, 2014 at 1:21 PM, Tim Heckman t...@pagerduty.com wrote:

 We're still at the exploratory stage on systems that are not
 production-facing but contain production-like data. Based on our
 placement strategy we have some concerns that the new datacenter
 approach may be riskier or more difficult. We're just trying to gauge
 both paths and see what works best for us.


Your case of RF=N is probably the best possible case for shuffle, but
general statements about how much this code has been exercised remain. :)


 The cluster I'm testing this on is a 5 node cluster with a placement
 strategy such that all nodes contain 100% of the data. In practice we
 have six clusters of similar size that are used for different
 services. These different clusters may need additional capacity at
 different times, so it's hard to answer the maximum size question. For
 now let's just assume that the clusters may never see an 11th
 member... but no guarantees.


With RF of 3, cluster sizes of under approximately 10 tend to net lose from
vnodes. If these clusters are not very likely to ever have more than 10
nodes, consider not using Vnodes.


 We're looking to use vnodes to help with easing the administrative
 work of scaling out the cluster. The improvements of streaming data
 during repairs amongst others.


Most of these wins don't occur until you have a lot of nodes, but the fixed
costs of having many ranges are paid all the time.


 For shuffle, it looks like it may be easier than adding a new
 datacenter and then have to adjust the schema for a new datacenter
 to come to life. And we weren't sure whether the same pitfalls of
 shuffle would effect us while having all data on all nodes.


Let us know! Good luck!

=Rob


Re: Failed to enable shuffling error

2014-09-08 Thread Jonathan Haddad
I believe shuffle has been removed recently.  I do not recommend using
it for any reason.

If you really want to go vnodes, your only sane option is to add a new
DC that uses vnodes and switch to it.

The downside in the 2.0.x branch to using vnodes is that repairs take
N times as long, where N is the number of tokens you put on each node.
I can't think of any other reasons why you wouldn't want to use vnodes
(but this may be significant enough for you by itself)

2.1 should address the repair issue for most use cases.

Jon


On Mon, Sep 8, 2014 at 1:28 PM, Robert Coli rc...@eventbrite.com wrote:
 On Mon, Sep 8, 2014 at 1:21 PM, Tim Heckman t...@pagerduty.com wrote:

 We're still at the exploratory stage on systems that are not
 production-facing but contain production-like data. Based on our
 placement strategy we have some concerns that the new datacenter
 approach may be riskier or more difficult. We're just trying to gauge
 both paths and see what works best for us.


 Your case of RF=N is probably the best possible case for shuffle, but
 general statements about how much this code has been exercised remain. :)


 The cluster I'm testing this on is a 5 node cluster with a placement
 strategy such that all nodes contain 100% of the data. In practice we
 have six clusters of similar size that are used for different
 services. These different clusters may need additional capacity at
 different times, so it's hard to answer the maximum size question. For
 now let's just assume that the clusters may never see an 11th
 member... but no guarantees.


 With RF of 3, cluster sizes of under approximately 10 tend to net lose from
 vnodes. If these clusters are not very likely to ever have more than 10
 nodes, consider not using Vnodes.


 We're looking to use vnodes to help with easing the administrative
 work of scaling out the cluster. The improvements of streaming data
 during repairs amongst others.


 Most of these wins don't occur until you have a lot of nodes, but the fixed
 costs of having many ranges are paid all the time.


 For shuffle, it looks like it may be easier than adding a new
 datacenter and then have to adjust the schema for a new datacenter
 to come to life. And we weren't sure whether the same pitfalls of
 shuffle would effect us while having all data on all nodes.


 Let us know! Good luck!

 =Rob




-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Failed to enable shuffling error

2014-09-08 Thread Tim Heckman
On Mon, Sep 8, 2014 at 1:45 PM, Jonathan Haddad j...@jonhaddad.com wrote:
 I believe shuffle has been removed recently.  I do not recommend using
 it for any reason.

We're still using the 1.2.x branch of Cassandra, and will be for some
time due to the thrift deprecation. Has it only been removed from the
2.x line?

 If you really want to go vnodes, your only sane option is to add a new
 DC that uses vnodes and switch to it.

We use the NetworkTopologyStrategy across three geographically
separated regions. Doing it this way feels a bit more risky based on
our replication strategy. Also, I'm not sure where all we have our
current datacenter names defined across our different internal
repositories. So there could be quite a large number of changes going
this route.

 The downside in the 2.0.x branch to using vnodes is that repairs take
 N times as long, where N is the number of tokens you put on each node.
 I can't think of any other reasons why you wouldn't want to use vnodes
 (but this may be significant enough for you by itself)

 2.1 should address the repair issue for most use cases.

 Jon

Thank you for the notes on the behaviors in the 2.x branch. If we do
move to the 2.x version that's something we'll be keeping in mind.

Cheers!
-Tim

 On Mon, Sep 8, 2014 at 1:28 PM, Robert Coli rc...@eventbrite.com wrote:
 On Mon, Sep 8, 2014 at 1:21 PM, Tim Heckman t...@pagerduty.com wrote:

 We're still at the exploratory stage on systems that are not
 production-facing but contain production-like data. Based on our
 placement strategy we have some concerns that the new datacenter
 approach may be riskier or more difficult. We're just trying to gauge
 both paths and see what works best for us.


 Your case of RF=N is probably the best possible case for shuffle, but
 general statements about how much this code has been exercised remain. :)


 The cluster I'm testing this on is a 5 node cluster with a placement
 strategy such that all nodes contain 100% of the data. In practice we
 have six clusters of similar size that are used for different
 services. These different clusters may need additional capacity at
 different times, so it's hard to answer the maximum size question. For
 now let's just assume that the clusters may never see an 11th
 member... but no guarantees.


 With RF of 3, cluster sizes of under approximately 10 tend to net lose from
 vnodes. If these clusters are not very likely to ever have more than 10
 nodes, consider not using Vnodes.


 We're looking to use vnodes to help with easing the administrative
 work of scaling out the cluster. The improvements of streaming data
 during repairs amongst others.


 Most of these wins don't occur until you have a lot of nodes, but the fixed
 costs of having many ranges are paid all the time.


 For shuffle, it looks like it may be easier than adding a new
 datacenter and then have to adjust the schema for a new datacenter
 to come to life. And we weren't sure whether the same pitfalls of
 shuffle would effect us while having all data on all nodes.


 Let us know! Good luck!

 =Rob




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade


Re: Failed to enable shuffling error

2014-09-08 Thread Jonathan Haddad
Thrift is still present in the 2.0 branch as well as 2.1.  Where did
you see that it's deprecated?

Let me elaborate my earlier advice.  Shuffle was removed because it
doesn't work for anything beyond a trivial dataset.  It is definitely
more risky than adding a new vnode enabled DC, as it does not work
at all.

On Mon, Sep 8, 2014 at 2:01 PM, Tim Heckman t...@pagerduty.com wrote:
 On Mon, Sep 8, 2014 at 1:45 PM, Jonathan Haddad j...@jonhaddad.com wrote:
 I believe shuffle has been removed recently.  I do not recommend using
 it for any reason.

 We're still using the 1.2.x branch of Cassandra, and will be for some
 time due to the thrift deprecation. Has it only been removed from the
 2.x line?

 If you really want to go vnodes, your only sane option is to add a new
 DC that uses vnodes and switch to it.

 We use the NetworkTopologyStrategy across three geographically
 separated regions. Doing it this way feels a bit more risky based on
 our replication strategy. Also, I'm not sure where all we have our
 current datacenter names defined across our different internal
 repositories. So there could be quite a large number of changes going
 this route.

 The downside in the 2.0.x branch to using vnodes is that repairs take
 N times as long, where N is the number of tokens you put on each node.
 I can't think of any other reasons why you wouldn't want to use vnodes
 (but this may be significant enough for you by itself)

 2.1 should address the repair issue for most use cases.

 Jon

 Thank you for the notes on the behaviors in the 2.x branch. If we do
 move to the 2.x version that's something we'll be keeping in mind.

 Cheers!
 -Tim

 On Mon, Sep 8, 2014 at 1:28 PM, Robert Coli rc...@eventbrite.com wrote:
 On Mon, Sep 8, 2014 at 1:21 PM, Tim Heckman t...@pagerduty.com wrote:

 We're still at the exploratory stage on systems that are not
 production-facing but contain production-like data. Based on our
 placement strategy we have some concerns that the new datacenter
 approach may be riskier or more difficult. We're just trying to gauge
 both paths and see what works best for us.


 Your case of RF=N is probably the best possible case for shuffle, but
 general statements about how much this code has been exercised remain. :)


 The cluster I'm testing this on is a 5 node cluster with a placement
 strategy such that all nodes contain 100% of the data. In practice we
 have six clusters of similar size that are used for different
 services. These different clusters may need additional capacity at
 different times, so it's hard to answer the maximum size question. For
 now let's just assume that the clusters may never see an 11th
 member... but no guarantees.


 With RF of 3, cluster sizes of under approximately 10 tend to net lose from
 vnodes. If these clusters are not very likely to ever have more than 10
 nodes, consider not using Vnodes.


 We're looking to use vnodes to help with easing the administrative
 work of scaling out the cluster. The improvements of streaming data
 during repairs amongst others.


 Most of these wins don't occur until you have a lot of nodes, but the fixed
 costs of having many ranges are paid all the time.


 For shuffle, it looks like it may be easier than adding a new
 datacenter and then have to adjust the schema for a new datacenter
 to come to life. And we weren't sure whether the same pitfalls of
 shuffle would effect us while having all data on all nodes.


 Let us know! Good luck!

 =Rob




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 twitter: rustyrazorblade



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


When CHANGES and JIRA Fix Versions disagree which should we believe?

2014-09-08 Thread Peter Haggerty
When the CHANGES file shows an issue as being in a particular release
but the JIRA for the issue shows a different version in Fix Versions
which one is right?

All four of these are listed in 2.0.10 in the CHANGES file:
https://github.com/apache/cassandra/blob/cassandra-2.0/CHANGES.txt

https://issues.apache.org/jira/browse/CASSANDRA-7808
https://issues.apache.org/jira/browse/CASSANDRA-7810
https://issues.apache.org/jira/browse/CASSANDRA-7828
https://issues.apache.org/jira/browse/CASSANDRA-7145

and this one is in the 2.0.10 section of CHANGES but the Fix Versions
for it doesn't include any 2.x versions:
https://issues.apache.org/jira/browse/CASSANDRA-7543

Should I just comment in each JIRA where there is disagreement between
it and the CHANGES file?


Peter


Re: Failed to enable shuffling error

2014-09-08 Thread Robert Coli
On Mon, Sep 8, 2014 at 2:01 PM, Tim Heckman t...@pagerduty.com wrote:

 We're still using the 1.2.x branch of Cassandra, and will be for some
 time due to the thrift deprecation. Has it only been removed from the
 2.x line?


Other than the fact that 2.0.x is not production ready yet, there's no
reason not to go to newer versions. Thrift is deprecated and unmaintained
in versions above 2.0 but is unlikely to be actually removed from the
codebase for at least another 3 or 4 major versions. In fact, the official
statement is that there are no plans to remove it; let us hope that's not
true.

=Rob


Re: When CHANGES and JIRA Fix Versions disagree which should we believe?

2014-09-08 Thread Robert Coli
On Mon, Sep 8, 2014 at 2:56 PM, Peter Haggerty peter.hagge...@librato.com
wrote:

 When the CHANGES file shows an issue as being in a particular release
 but the JIRA for the issue shows a different version in Fix Versions
 which one is right?


CHANGES.txt management is kinda a mess. JIRA is likely to be more correct.

=Rob


Re: When CHANGES and JIRA Fix Versions disagree which should we believe?

2014-09-08 Thread Benedict Elliott Smith
In this case, it seems more likely CHANGES.txt will be correct, since it is
maintained *at time of commit*, whereas JIRA fix versions can be forgotten
to be maintained.

On Tue, Sep 9, 2014 at 7:07 AM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Sep 8, 2014 at 2:56 PM, Peter Haggerty peter.hagge...@librato.com
  wrote:

 When the CHANGES file shows an issue as being in a particular release
 but the JIRA for the issue shows a different version in Fix Versions
 which one is right?


 CHANGES.txt management is kinda a mess. JIRA is likely to be more correct.

 =Rob




Re: Moving Cassandra from EC2 Classic into VPC

2014-09-08 Thread Ben Bromhead
On 8 Sep 2014, at 12:34 pm, Oleg Dulin oleg.du...@gmail.com wrote:

 Another idea I had was taking the ec2-snitch configuration and converting it 
 into a Property file snitch. But I still don't understand how to perform this 
 move since I need my newly created VPC instances to have public IPs -- 
 something I would like to avoid.

Off the top of my head something like this might work if you want a no downtime 
approach:

Use the gossiping property file snitch in the VPC data centre. 

Use a public elastic ip for each node.

Have the instances in the VPC join your existing cluster.

Decommission old cluster.

Change the advertised endpoint addresses afterwards to the private addresses 
for nodes in the VPC using the following:
https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

Once that is done, remove the elastic IPs from the instances.