RE: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-02-03 Thread Jacques-Henri Berthemet
You will have the same problem without IF NOT EXIST, at least I had Cassandra 
2.1 complaining about having tables with the same name but different UUIDs. In 
the end in our case we have a single application node that is responsible for 
schema upgrades, that’s ok for us as we don’t plan to have the schema upgraded 
that much.

--
Jacques-Henri Berthemet

From: Ken Hancock [mailto:ken.hanc...@schange.com]
Sent: mardi 2 février 2016 17:14
To: user@cassandra.apache.org
Subject: Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 
upgrade

Just to close the loop on this, but am I correct that the IF NOT EXITS isn't 
the real problem?  Even multiple calls to CREATE TABLE cause the same schema 
mismatch if done concurrently?  Normally, a CREATE TABLE call will return an 
exception that the table already exists.

On Tue, Feb 2, 2016 at 11:06 AM, Jack Krupansky 
> wrote:
And CASSANDRA-10699  seems to be the sub-issue of CASSANDRA-9424 to do that:
https://issues.apache.org/jira/browse/CASSANDRA-10699


-- Jack Krupansky

On Tue, Feb 2, 2016 at 9:59 AM, Sebastian Estevez 
> wrote:

Hi Ken,

Earlier in this thread I posted a link to 
https://issues.apache.org/jira/browse/CASSANDRA-9424

That is the fix for these schema disagreement issues and as ay commented, the 
plan is to use CAS. Until then we have to treat schema delicately.

all the best,

Sebastián
On Feb 2, 2016 9:48 AM, "Ken Hancock" 
> wrote:
So this rings odd to me.  If you can accomplish the same thing by using a CAS 
operation, why not fix create table if not exist so that if your are writing an 
application that creates the table on startup, that the application is safe to 
run on multiple nodes and uses CAS to safeguard multiple concurrent creations?

On Tue, Jan 26, 2016 at 12:32 PM, Eric Stevens 
> wrote:
There's still a race condition there, because two clients could SELECT at the 
same time as each other, then both INSERT.

You'd be better served with a CAS operation, and let Paxos guarantee 
at-most-once execution.

On Tue, Jan 26, 2016 at 9:06 AM Francisco Reyes 
> wrote:
On 01/22/2016 10:29 PM, Kevin Burton wrote:
I sort of agree.. but we are also considering migrating to hourly tables.. and 
what if the single script doesn't run.

I like having N nodes make changes like this because in my experience that 
central / single box will usually fail at the wrong time :-/



On Fri, Jan 22, 2016 at 6:47 PM, Jonathan Haddad 
> wrote:
Instead of using ZK, why not solve your concurrency problem by removing it?  By 
that, I mean simply have 1 process that creates all your tables instead of 
creating a race condition intentionally?

On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton 
> wrote:
Not sure if this is a bug or not or kind of a *fuzzy* area.

In 2.0 this worked fine.

We have a bunch of automated scripts that go through and create tables... one 
per day.

at midnight UTC our entire CQL went offline.. .took down our whole app.  ;-/

The resolution was a full CQL shut down and then a drop table to remove the bad 
tables...

pretty sure the issue was with schema disagreement.

All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT EXISTS only 
checks locally?

My work around is going to be to use zookeeper to create a mutex lock during 
this operation.

Any other things I should avoid?


--
We’re hiring if you know of any awesome Java Devops or Linux Operations 
Engineers!

Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ 
profile
Error! Filename not specified.



--
We’re hiring if you know of any awesome Java Devops or Linux Operations 
Engineers!

Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ 
profile
Error! Filename not specified.

One way to accomplish both, a single process doing the work and having multiple 
machines be able to do it, is to have a control table.

You can have a table that lists what tables have been created and force 
concistency all. In this table you list the names of tables created. If a table 
name is in there, it doesn't need to be created again.



--
Ken Hancock | System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | 
www.schange.com | 
NASDAQ:SEAC
Office: +1 (978) 

Re: Missing rows while scanning table using java driver

2016-02-03 Thread Jack Krupansky
CL=ALL has no benefit is RF=1.

Your code snippet doesn't indicate how you initialize and update the token
in the query. The ">" operator would assure that you skip the first token.

-- Jack Krupansky

On Wed, Feb 3, 2016 at 1:36 AM, Priyanka Gugale  wrote:

> Hi,
>
> I am using Cassandra 2.2.0 and cassandra driver 2.1.8. I am trying to scan
> a table as per suggestions given here
> ,
>  On running the code to fetch records from table, it fetches different
> number of records on each run. Some times it reads all records from table,
>  and some times some records are missing. As I have observed there is no
> fixed pattern for missing records.
>
> I have tried to set consistency level to ALL while running select query
> still I couldn't fetch all records. Is there any known issue? Or am I
> suppose to do anything more than running simple "select" statement.
>
> Code snippet to fetch data:
>
>  SimpleStatement stmt = new SimpleStatement(query);
>  stmt.setConsistencyLevel(ConsistencyLevel.ALL);
>  ResultSet result = session.execute(stmt);
>  if (!result.isExhausted()) {
>for (Row row : result) {
>  process(row);
>}
>  }
>
> Query is of the form: select * from %t where token(%p) > %s limit %l;
>
> where t=tablename, %p=primary key, %s=token value of primary key and
> l=limit
>
> I am testing on my local machine and has created a Keyspace with
> replication factor of 1. Also I don't see any errors in the logs.
>
> -Priyanka
>


Re: EC2 storage options for C*

2016-02-03 Thread Will Hayworth
We're using GP2 EBS (3 TB volumes) with m4.xlarges after originally looking
at I2 and D2 instances (thanks, Jeff, for your advice with that one). So
far, so good. (Our workload is write-heavy at the moment but reads are
steadily increasing.)

___
Will Hayworth
Developer, Engagement Engine
My pronoun is "they". 



On Wed, Feb 3, 2016 at 12:17 PM, Ben Bromhead  wrote:

> For what it's worth we've tried d2 instances and they encourage terrible
> things like super dense nodes (increases your replacement time). In terms
> of useable storage I would go with gp2 EBS on a m4 based instance.
>
> On Mon, 1 Feb 2016 at 14:25 Jack Krupansky 
> wrote:
>
>> Ah, yes, the good old days of m1.large.
>>
>> -- Jack Krupansky
>>
>> On Mon, Feb 1, 2016 at 5:12 PM, Jeff Jirsa 
>> wrote:
>>
>>> A lot of people use the old gen instances (m1 in particular) because
>>> they came with a ton of effectively free ephemeral storage (up to 1.6TB).
>>> Whether or not they’re viable is a decision for each user to make. They’re
>>> very, very commonly used for C*, though. At a time when EBS was not
>>> sufficiently robust or reliable, a cluster of m1 instances was the de facto
>>> standard.
>>>
>>> The canonical “best practice” in 2015 was i2. We believe we’ve made a
>>> compelling argument to use m4 or c4 instead of i2. There exists a company
>>> we know currently testing d2 at scale, though I’m not sure they have much
>>> in terms of concrete results at this time.
>>>
>>> - Jeff
>>>
>>> From: Jack Krupansky
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Monday, February 1, 2016 at 1:55 PM
>>>
>>> To: "user@cassandra.apache.org"
>>> Subject: Re: EC2 storage options for C*
>>>
>>> Thanks. My typo - I referenced "C2 Dense Storage" which is really "D2
>>> Dense Storage".
>>>
>>> The remaining question is whether any of the "Previous Generation
>>> Instances" should be publicly recommended going forward.
>>>
>>> And whether non-SSD instances should be recommended going forward as
>>> well. sure, technically, someone could use the legacy instances, but the
>>> question is what we should be recommending as best practice going forward.
>>>
>>> Yeah, the i2 instances look like the sweet spot for any non-EBS clusters.
>>>
>>> -- Jack Krupansky
>>>
>>> On Mon, Feb 1, 2016 at 4:30 PM, Steve Robenalt 
>>> wrote:
>>>
 Hi Jack,

 At the bottom of the instance-types page, there is a link to the
 previous generations, which includes the older series (m1, m2, etc), many
 of which have HDD options.

 There are also the d2 (Dense Storage) instances in the current
 generation that include various combos of local HDDs.

 The i2 series has good sized SSDs available, and has the advanced
 networking option, which is also useful for Cassandra. The enhanced
 networking is available with other instance types as well, as you'll see on
 the feature list under each type.

 Steve



 On Mon, Feb 1, 2016 at 1:17 PM, Jack Krupansky <
 jack.krupan...@gmail.com> wrote:

> Thanks. Reading a little bit on AWS, and back to my SSD vs. magnetic
> question, it seems like magnetic (HDD) is no longer a recommended storage
> option for databases on AWS. In particular, only the C2 Dense Storage
> instances have local magnetic storage - all the other instance types are
> SSD or EBS-only - and EBS Magnetic is only recommended for "Infrequent 
> Data
> Access."
>
> For the record, that AWS doc has Cassandra listed as a use case for i2
> instance types.
>
> Also, the AWS doc lists EBS io2 for the NoSQL database use case and
> gp2 only for the "small to medium databases" use case.
>
> Do older instances with local HDD still exist on AWS (m1, m2, etc.)?
> Is the doc simply for any newly started instances?
>
> See:
> https://aws.amazon.com/ec2/instance-types/
> http://aws.amazon.com/ebs/details/
>
>
> -- Jack Krupansky
>
> On Mon, Feb 1, 2016 at 2:09 PM, Jeff Jirsa  > wrote:
>
>> > My apologies if my questions are actually answered on the video or
>> slides, I just did a quick scan of the slide text.
>>
>> Virtually all of them are covered.
>>
>> > I'm curious where the EBS physical devices actually reside - are
>> they in the same rack, the same data center, same availability zone? I
>> mean, people try to minimize network latency between nodes, so how 
>> exactly
>> is EBS able to avoid network latency?
>>
>> Not published,and probably not a straight forward answer (probably
>> have redundancy cross-az, if it matches some of their other published
>> behaviors). The promise they give you is ‘iops’, with a certain block 
>> size.

Re: EC2 storage options for C*

2016-02-03 Thread Sebastian Estevez
Good points Bryan, some more color:

Regular EBS is *not* okay for C*. But AWS has some nicer EBS now that has
performed okay recently:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

https://www.youtube.com/watch?v=1R-mgOcOSd4


The cloud vendors are moving toward shared storage and we can't ignore that
in the long term (they will push us in that direction financially).
Fortunately their shared storage offerings are also getting better. For
example google's elastic storage offerring provides very reliable latencies
which is what we care the most
about, not iops.

On the practical side, a key thing I've noticed with real deployments is
that the size of the volume affects how fast it will perform and how stable
it's latencies will be so make sure to get large EBS volumes > 1tb to get
decent performance, even if your nodes aren't that dense.




All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Wed, Feb 3, 2016 at 7:23 PM, Bryan Cheng  wrote:

> From my experience, EBS has transitioned from "stay the hell away" to "OK"
> as the new GP2 SSD type has come out and stabilized over the last few
> years, especially with the addition of EBS-optimized instances that have
> dedicated EBS bandwidth. The latter has really helped to stabilize the
> problematic 99.9-percentile latency spikes that use to plague EBS volumes.
>
> EBS (IMHO) has always had operational advantages, but inconsistent latency
> and generally poor performance in the past lead many to disregard it.
>
> On Wed, Feb 3, 2016 at 4:09 PM, James Rothering 
> wrote:
>
>> Just curious here ... when did EBS become OK for C*? Didn't they always
>> push towards using ephemeral disks?
>>
>> On Wed, Feb 3, 2016 at 12:17 PM, Ben Bromhead 
>> wrote:
>>
>>> For what it's worth we've tried d2 instances and they encourage terrible
>>> things like super dense nodes (increases your replacement time). In terms
>>> of useable storage I would go with gp2 EBS on a m4 based instance.
>>>
>>> On Mon, 1 Feb 2016 at 14:25 Jack Krupansky 
>>> wrote:
>>>
 Ah, yes, the good old days of m1.large.

 -- Jack Krupansky

 On Mon, Feb 1, 2016 at 5:12 PM, Jeff Jirsa 
 wrote:

> A lot of people use the old gen instances (m1 in particular) because
> they came with a ton of effectively free ephemeral storage (up to 1.6TB).
> Whether or not they’re viable is a decision for each user to make. They’re
> very, very commonly used for C*, though. At a time when EBS was not
> sufficiently robust or reliable, a cluster of m1 instances was the de 
> facto
> standard.
>
> The canonical “best practice” in 2015 was i2. We believe we’ve made a
> compelling argument to use m4 or c4 instead of i2. There exists a company
> we know currently testing d2 at scale, though I’m not sure they have much
> in terms of concrete results at this time.
>
> - Jeff
>
> From: Jack Krupansky
> Reply-To: "user@cassandra.apache.org"
> Date: Monday, February 1, 2016 at 1:55 PM
>
> To: "user@cassandra.apache.org"
> Subject: Re: EC2 storage options for C*
>
> Thanks. My typo - I referenced "C2 Dense Storage" which is really "D2
> Dense Storage".
>
> The remaining question is whether any of the "Previous Generation
> Instances" should be publicly recommended going forward.
>
> And whether non-SSD instances should be recommended going forward as
> well. sure, technically, someone could use the legacy instances, but the
> question is what we should be recommending as best practice going forward.
>
> Yeah, the i2 instances look like the sweet spot for any non-EBS
> clusters.
>
> -- Jack Krupansky
>
> On Mon, Feb 1, 2016 at 4:30 PM, Steve Robenalt  > wrote:
>
>> Hi Jack,
>>
>> At the bottom of the instance-types page, there is a link to the

Re: EC2 storage options for C*

2016-02-03 Thread Sebastian Estevez
By the way, if someone wants to do some hard core testing like Al, I wrote
a guide on how to use his tool:

http://www.sestevez.com/how-to-use-toberts-effio/

I'm sure folks on this list would like to see more stats : )

All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Wed, Feb 3, 2016 at 7:27 PM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Good points Bryan, some more color:
>
> Regular EBS is *not* okay for C*. But AWS has some nicer EBS now that has
> performed okay recently:
>
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
>
> https://www.youtube.com/watch?v=1R-mgOcOSd4
>
>
> The cloud vendors are moving toward shared storage and we can't ignore
> that in the long term (they will push us in that direction financially).
> Fortunately their shared storage offerings are also getting better. For
> example google's elastic storage offerring provides very reliable
> latencies  which is what we
> care the most about, not iops.
>
> On the practical side, a key thing I've noticed with real deployments is
> that the size of the volume affects how fast it will perform and how stable
> it's latencies will be so make sure to get large EBS volumes > 1tb to get
> decent performance, even if your nodes aren't that dense.
>
>
>
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Wed, Feb 3, 2016 at 7:23 PM, Bryan Cheng  wrote:
>
>> From my experience, EBS has transitioned from "stay the hell away" to
>> "OK" as the new GP2 SSD type has come out and stabilized over the last few
>> years, especially with the addition of EBS-optimized instances that have
>> dedicated EBS bandwidth. The latter has really helped to stabilize the
>> problematic 99.9-percentile latency spikes that use to plague EBS volumes.
>>
>> EBS (IMHO) has always had operational advantages, but inconsistent
>> latency and generally poor performance in the past lead many to disregard
>> it.
>>
>> On Wed, Feb 3, 2016 at 4:09 PM, James Rothering 
>> wrote:
>>
>>> Just curious here ... when did EBS become OK for C*? Didn't they always
>>> push towards using ephemeral disks?
>>>
>>> On Wed, Feb 3, 2016 at 12:17 PM, Ben Bromhead 
>>> wrote:
>>>
 For what it's worth we've tried d2 instances and they encourage
 terrible things like super dense nodes (increases your replacement time).
 In terms of useable storage I would go with gp2 EBS on a m4 based instance.

 On Mon, 1 Feb 2016 at 14:25 Jack Krupansky 
 wrote:

> Ah, yes, the good old days of m1.large.
>
> -- Jack Krupansky
>
> On Mon, Feb 1, 2016 at 5:12 PM, Jeff Jirsa  > wrote:
>
>> A lot of people use the old gen instances (m1 in particular) because
>> they came with a ton of effectively free ephemeral storage (up to 1.6TB).
>> Whether or not they’re viable is a decision for each user to make. 
>> They’re
>> very, very commonly used for C*, though. At a time when EBS was not
>> sufficiently robust or reliable, a cluster of 

Re: EC2 storage options for C*

2016-02-03 Thread Bryan Cheng
>From my experience, EBS has transitioned from "stay the hell away" to "OK"
as the new GP2 SSD type has come out and stabilized over the last few
years, especially with the addition of EBS-optimized instances that have
dedicated EBS bandwidth. The latter has really helped to stabilize the
problematic 99.9-percentile latency spikes that use to plague EBS volumes.

EBS (IMHO) has always had operational advantages, but inconsistent latency
and generally poor performance in the past lead many to disregard it.

On Wed, Feb 3, 2016 at 4:09 PM, James Rothering 
wrote:

> Just curious here ... when did EBS become OK for C*? Didn't they always
> push towards using ephemeral disks?
>
> On Wed, Feb 3, 2016 at 12:17 PM, Ben Bromhead  wrote:
>
>> For what it's worth we've tried d2 instances and they encourage terrible
>> things like super dense nodes (increases your replacement time). In terms
>> of useable storage I would go with gp2 EBS on a m4 based instance.
>>
>> On Mon, 1 Feb 2016 at 14:25 Jack Krupansky 
>> wrote:
>>
>>> Ah, yes, the good old days of m1.large.
>>>
>>> -- Jack Krupansky
>>>
>>> On Mon, Feb 1, 2016 at 5:12 PM, Jeff Jirsa 
>>> wrote:
>>>
 A lot of people use the old gen instances (m1 in particular) because
 they came with a ton of effectively free ephemeral storage (up to 1.6TB).
 Whether or not they’re viable is a decision for each user to make. They’re
 very, very commonly used for C*, though. At a time when EBS was not
 sufficiently robust or reliable, a cluster of m1 instances was the de facto
 standard.

 The canonical “best practice” in 2015 was i2. We believe we’ve made a
 compelling argument to use m4 or c4 instead of i2. There exists a company
 we know currently testing d2 at scale, though I’m not sure they have much
 in terms of concrete results at this time.

 - Jeff

 From: Jack Krupansky
 Reply-To: "user@cassandra.apache.org"
 Date: Monday, February 1, 2016 at 1:55 PM

 To: "user@cassandra.apache.org"
 Subject: Re: EC2 storage options for C*

 Thanks. My typo - I referenced "C2 Dense Storage" which is really "D2
 Dense Storage".

 The remaining question is whether any of the "Previous Generation
 Instances" should be publicly recommended going forward.

 And whether non-SSD instances should be recommended going forward as
 well. sure, technically, someone could use the legacy instances, but the
 question is what we should be recommending as best practice going forward.

 Yeah, the i2 instances look like the sweet spot for any non-EBS
 clusters.

 -- Jack Krupansky

 On Mon, Feb 1, 2016 at 4:30 PM, Steve Robenalt 
 wrote:

> Hi Jack,
>
> At the bottom of the instance-types page, there is a link to the
> previous generations, which includes the older series (m1, m2, etc), many
> of which have HDD options.
>
> There are also the d2 (Dense Storage) instances in the current
> generation that include various combos of local HDDs.
>
> The i2 series has good sized SSDs available, and has the advanced
> networking option, which is also useful for Cassandra. The enhanced
> networking is available with other instance types as well, as you'll see 
> on
> the feature list under each type.
>
> Steve
>
>
>
> On Mon, Feb 1, 2016 at 1:17 PM, Jack Krupansky <
> jack.krupan...@gmail.com> wrote:
>
>> Thanks. Reading a little bit on AWS, and back to my SSD vs. magnetic
>> question, it seems like magnetic (HDD) is no longer a recommended storage
>> option for databases on AWS. In particular, only the C2 Dense Storage
>> instances have local magnetic storage - all the other instance types are
>> SSD or EBS-only - and EBS Magnetic is only recommended for "Infrequent 
>> Data
>> Access."
>>
>> For the record, that AWS doc has Cassandra listed as a use case for
>> i2 instance types.
>>
>> Also, the AWS doc lists EBS io2 for the NoSQL database use case and
>> gp2 only for the "small to medium databases" use case.
>>
>> Do older instances with local HDD still exist on AWS (m1, m2, etc.)?
>> Is the doc simply for any newly started instances?
>>
>> See:
>> https://aws.amazon.com/ec2/instance-types/
>> http://aws.amazon.com/ebs/details/
>>
>>
>> -- Jack Krupansky
>>
>> On Mon, Feb 1, 2016 at 2:09 PM, Jeff Jirsa <
>> jeff.ji...@crowdstrike.com> wrote:
>>
>>> > My apologies if my questions are actually answered on the video or
>>> slides, I just did a quick scan of the slide text.
>>>
>>> Virtually all of them are covered.
>>>
>>> > I'm curious where the EBS physical devices actually reside - are
>>> they in the same rack, 

Re: EC2 storage options for C*

2016-02-03 Thread Jeff Jirsa
I don’t want to be “that guy”, but there’s literally almost a dozen emails in 
this thread answering exactly that question. Did you read the thread to which 
you replied? 


From:  James Rothering
Reply-To:  "user@cassandra.apache.org"
Date:  Wednesday, February 3, 2016 at 4:09 PM
To:  "user@cassandra.apache.org"
Subject:  Re: EC2 storage options for C*

Just curious here ... when did EBS become OK for C*? Didn't they always push 
towards using ephemeral disks?

On Wed, Feb 3, 2016 at 12:17 PM, Ben Bromhead  wrote:
For what it's worth we've tried d2 instances and they encourage terrible things 
like super dense nodes (increases your replacement time). In terms of useable 
storage I would go with gp2 EBS on a m4 based instance. 

On Mon, 1 Feb 2016 at 14:25 Jack Krupansky  wrote:
Ah, yes, the good old days of m1.large.

-- Jack Krupansky

On Mon, Feb 1, 2016 at 5:12 PM, Jeff Jirsa  wrote:
A lot of people use the old gen instances (m1 in particular) because they came 
with a ton of effectively free ephemeral storage (up to 1.6TB). Whether or not 
they’re viable is a decision for each user to make. They’re very, very commonly 
used for C*, though. At a time when EBS was not sufficiently robust or 
reliable, a cluster of m1 instances was the de facto standard. 

The canonical “best practice” in 2015 was i2. We believe we’ve made a 
compelling argument to use m4 or c4 instead of i2. There exists a company we 
know currently testing d2 at scale, though I’m not sure they have much in terms 
of concrete results at this time. 

- Jeff

From: Jack Krupansky
Reply-To: "user@cassandra.apache.org"
Date: Monday, February 1, 2016 at 1:55 PM 

To: "user@cassandra.apache.org"
Subject: Re: EC2 storage options for C*

Thanks. My typo - I referenced "C2 Dense Storage" which is really "D2 Dense 
Storage". 

The remaining question is whether any of the "Previous Generation Instances" 
should be publicly recommended going forward.

And whether non-SSD instances should be recommended going forward as well. 
sure, technically, someone could use the legacy instances, but the question is 
what we should be recommending as best practice going forward.

Yeah, the i2 instances look like the sweet spot for any non-EBS clusters.

-- Jack Krupansky

On Mon, Feb 1, 2016 at 4:30 PM, Steve Robenalt  wrote:
Hi Jack, 

At the bottom of the instance-types page, there is a link to the previous 
generations, which includes the older series (m1, m2, etc), many of which have 
HDD options. 

There are also the d2 (Dense Storage) instances in the current generation that 
include various combos of local HDDs.

The i2 series has good sized SSDs available, and has the advanced networking 
option, which is also useful for Cassandra. The enhanced networking is 
available with other instance types as well, as you'll see on the feature list 
under each type. 

Steve



On Mon, Feb 1, 2016 at 1:17 PM, Jack Krupansky  wrote:
Thanks. Reading a little bit on AWS, and back to my SSD vs. magnetic question, 
it seems like magnetic (HDD) is no longer a recommended storage option for 
databases on AWS. In particular, only the C2 Dense Storage instances have local 
magnetic storage - all the other instance types are SSD or EBS-only - and EBS 
Magnetic is only recommended for "Infrequent Data Access." 

For the record, that AWS doc has Cassandra listed as a use case for i2 instance 
types.

Also, the AWS doc lists EBS io2 for the NoSQL database use case and gp2 only 
for the "small to medium databases" use case.

Do older instances with local HDD still exist on AWS (m1, m2, etc.)? Is the doc 
simply for any newly started instances?

See:
https://aws.amazon.com/ec2/instance-types/
http://aws.amazon.com/ebs/details/


-- Jack Krupansky

On Mon, Feb 1, 2016 at 2:09 PM, Jeff Jirsa  wrote:
> My apologies if my questions are actually answered on the video or slides, I 
> just did a quick scan of the slide text.

Virtually all of them are covered.

> I'm curious where the EBS physical devices actually reside - are they in the 
> same rack, the same data center, same availability zone? I mean, people try 
> to minimize network latency between nodes, so how exactly is EBS able to 
> avoid network latency?

Not published,and probably not a straight forward answer (probably have 
redundancy cross-az, if it matches some of their other published behaviors). 
The promise they give you is ‘iops’, with a certain block size. Some instance 
types are optimized for dedicated, ebs-only network interfaces. Like most 
things in cassandra / cloud, the only way to know for sure is to test it 
yourself and see if observed latency is acceptable (or trust our testing, if 
you assume we’re sufficiently smart and honest). 

> Did your test use Amazon EBS–Optimized Instances?

We tested dozens of instance type/size combinations (literally). The 

Re: EC2 storage options for C*

2016-02-03 Thread Jack Krupansky
I meant to reply earlier that the current DataStax doc on EC2 is actually
reasonably decent. It says this about EBS:

"SSD-backed general purpose volumes (GP2) or provisioned IOPS volumes (PIOPS)
are suitable for production workloads."

with the caveat of:

"EBS magnetic volumes are not recommended for Cassandra data storage
volumes for the following reasons:..."

as well as:

"Note: Use only ephemeral instance-store or the recommended EBS volume
types for Cassandra data storage."

See:
http://docs.datastax.com/en/cassandra/3.x/cassandra/planning/planPlanningEC2.html





-- Jack Krupansky

On Wed, Feb 3, 2016 at 7:27 PM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Good points Bryan, some more color:
>
> Regular EBS is *not* okay for C*. But AWS has some nicer EBS now that has
> performed okay recently:
>
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
>
> https://www.youtube.com/watch?v=1R-mgOcOSd4
>
>
> The cloud vendors are moving toward shared storage and we can't ignore
> that in the long term (they will push us in that direction financially).
> Fortunately their shared storage offerings are also getting better. For
> example google's elastic storage offerring provides very reliable
> latencies  which is what we
> care the most about, not iops.
>
> On the practical side, a key thing I've noticed with real deployments is
> that the size of the volume affects how fast it will perform and how stable
> it's latencies will be so make sure to get large EBS volumes > 1tb to get
> decent performance, even if your nodes aren't that dense.
>
>
>
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Wed, Feb 3, 2016 at 7:23 PM, Bryan Cheng  wrote:
>
>> From my experience, EBS has transitioned from "stay the hell away" to
>> "OK" as the new GP2 SSD type has come out and stabilized over the last few
>> years, especially with the addition of EBS-optimized instances that have
>> dedicated EBS bandwidth. The latter has really helped to stabilize the
>> problematic 99.9-percentile latency spikes that use to plague EBS volumes.
>>
>> EBS (IMHO) has always had operational advantages, but inconsistent
>> latency and generally poor performance in the past lead many to disregard
>> it.
>>
>> On Wed, Feb 3, 2016 at 4:09 PM, James Rothering 
>> wrote:
>>
>>> Just curious here ... when did EBS become OK for C*? Didn't they always
>>> push towards using ephemeral disks?
>>>
>>> On Wed, Feb 3, 2016 at 12:17 PM, Ben Bromhead 
>>> wrote:
>>>
 For what it's worth we've tried d2 instances and they encourage
 terrible things like super dense nodes (increases your replacement time).
 In terms of useable storage I would go with gp2 EBS on a m4 based instance.

 On Mon, 1 Feb 2016 at 14:25 Jack Krupansky 
 wrote:

> Ah, yes, the good old days of m1.large.
>
> -- Jack Krupansky
>
> On Mon, Feb 1, 2016 at 5:12 PM, Jeff Jirsa  > wrote:
>
>> A lot of people use the old gen instances (m1 in particular) because
>> they came with a ton of effectively free ephemeral storage (up to 1.6TB).
>> Whether or not they’re viable is a decision for each user to make. 
>> They’re
>> very, very commonly used for C*, though. At a time when EBS was not
>> sufficiently robust or reliable, a cluster of m1 instances was the de 
>> facto
>> standard.
>>
>> The canonical “best practice” in 2015 was i2. We believe we’ve made a
>> compelling argument to use m4 or c4 instead of i2. There exists a company
>> we know currently testing d2 at scale, though I’m not sure they have much
>> in terms of concrete results at this time.
>>
>> - Jeff
>>
>> From: Jack Krupansky
>> Reply-To: "user@cassandra.apache.org"
>> Date: Monday, February 1, 2016 at 1:55 PM
>>
>> To: "user@cassandra.apache.org"
>> Subject: 

Re: Re : Possibility of using 2 different snitches in the Multi_DC cluster

2016-02-03 Thread Will Hayworth
Ec2MultiRegionSnitch does remove a bit of complexity for us, speaking as
someone who runs a small cluster that serves one system. It doesn't sound
like the right solution for you, though.

___
Will Hayworth
Developer, Engagement Engine
My pronoun is "they". 



On Tue, Feb 2, 2016 at 1:23 PM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> hi;
>   we have a multi-DC cluster spanning across our own private cloud and
> AWS. We are currently using Propertyfile snitch across our cluster.
>
> What is the possibility of using GossipingPropertFileSnitch on datacenters
> in our private cloud, and Ec2MultiRegionSnitch in AWS?
>
> Thanks in advance for the help.
>
> thanks
> Sai
>


Re: EC2 storage options for C*

2016-02-03 Thread James Rothering
Just curious here ... when did EBS become OK for C*? Didn't they always
push towards using ephemeral disks?

On Wed, Feb 3, 2016 at 12:17 PM, Ben Bromhead  wrote:

> For what it's worth we've tried d2 instances and they encourage terrible
> things like super dense nodes (increases your replacement time). In terms
> of useable storage I would go with gp2 EBS on a m4 based instance.
>
> On Mon, 1 Feb 2016 at 14:25 Jack Krupansky 
> wrote:
>
>> Ah, yes, the good old days of m1.large.
>>
>> -- Jack Krupansky
>>
>> On Mon, Feb 1, 2016 at 5:12 PM, Jeff Jirsa 
>> wrote:
>>
>>> A lot of people use the old gen instances (m1 in particular) because
>>> they came with a ton of effectively free ephemeral storage (up to 1.6TB).
>>> Whether or not they’re viable is a decision for each user to make. They’re
>>> very, very commonly used for C*, though. At a time when EBS was not
>>> sufficiently robust or reliable, a cluster of m1 instances was the de facto
>>> standard.
>>>
>>> The canonical “best practice” in 2015 was i2. We believe we’ve made a
>>> compelling argument to use m4 or c4 instead of i2. There exists a company
>>> we know currently testing d2 at scale, though I’m not sure they have much
>>> in terms of concrete results at this time.
>>>
>>> - Jeff
>>>
>>> From: Jack Krupansky
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Monday, February 1, 2016 at 1:55 PM
>>>
>>> To: "user@cassandra.apache.org"
>>> Subject: Re: EC2 storage options for C*
>>>
>>> Thanks. My typo - I referenced "C2 Dense Storage" which is really "D2
>>> Dense Storage".
>>>
>>> The remaining question is whether any of the "Previous Generation
>>> Instances" should be publicly recommended going forward.
>>>
>>> And whether non-SSD instances should be recommended going forward as
>>> well. sure, technically, someone could use the legacy instances, but the
>>> question is what we should be recommending as best practice going forward.
>>>
>>> Yeah, the i2 instances look like the sweet spot for any non-EBS clusters.
>>>
>>> -- Jack Krupansky
>>>
>>> On Mon, Feb 1, 2016 at 4:30 PM, Steve Robenalt 
>>> wrote:
>>>
 Hi Jack,

 At the bottom of the instance-types page, there is a link to the
 previous generations, which includes the older series (m1, m2, etc), many
 of which have HDD options.

 There are also the d2 (Dense Storage) instances in the current
 generation that include various combos of local HDDs.

 The i2 series has good sized SSDs available, and has the advanced
 networking option, which is also useful for Cassandra. The enhanced
 networking is available with other instance types as well, as you'll see on
 the feature list under each type.

 Steve



 On Mon, Feb 1, 2016 at 1:17 PM, Jack Krupansky <
 jack.krupan...@gmail.com> wrote:

> Thanks. Reading a little bit on AWS, and back to my SSD vs. magnetic
> question, it seems like magnetic (HDD) is no longer a recommended storage
> option for databases on AWS. In particular, only the C2 Dense Storage
> instances have local magnetic storage - all the other instance types are
> SSD or EBS-only - and EBS Magnetic is only recommended for "Infrequent 
> Data
> Access."
>
> For the record, that AWS doc has Cassandra listed as a use case for i2
> instance types.
>
> Also, the AWS doc lists EBS io2 for the NoSQL database use case and
> gp2 only for the "small to medium databases" use case.
>
> Do older instances with local HDD still exist on AWS (m1, m2, etc.)?
> Is the doc simply for any newly started instances?
>
> See:
> https://aws.amazon.com/ec2/instance-types/
> http://aws.amazon.com/ebs/details/
>
>
> -- Jack Krupansky
>
> On Mon, Feb 1, 2016 at 2:09 PM, Jeff Jirsa  > wrote:
>
>> > My apologies if my questions are actually answered on the video or
>> slides, I just did a quick scan of the slide text.
>>
>> Virtually all of them are covered.
>>
>> > I'm curious where the EBS physical devices actually reside - are
>> they in the same rack, the same data center, same availability zone? I
>> mean, people try to minimize network latency between nodes, so how 
>> exactly
>> is EBS able to avoid network latency?
>>
>> Not published,and probably not a straight forward answer (probably
>> have redundancy cross-az, if it matches some of their other published
>> behaviors). The promise they give you is ‘iops’, with a certain block 
>> size.
>> Some instance types are optimized for dedicated, ebs-only network
>> interfaces. Like most things in cassandra / cloud, the only way to know 
>> for
>> sure is to test it yourself and see if observed latency is acceptable (or
>> trust our testing, if you assume 

Re: Any tips on how to track down why Cassandra won't cluster?

2016-02-03 Thread Bryan Cheng
> On Wed, 3 Feb 2016 at 11:49 Richard L. Burton III 
> wrote:
>
>>
>> Any suggestions on how to track down what might trigger this problem? I'm
>> not receiving any exceptions.
>>
>
You're not getting "Unable to gossip with any seeds" on the second node?
What does nodetool status show on both machines?


Re: Missing rows while scanning table using java driver

2016-02-03 Thread priyanka gugale
Thanks for all help.
The problem was, my application was starting read process as soon as write
is over. I guess due to eventual consistency of write in cassandra reader
was missing records as they were still in process of writing.

I will be using paging feature for processing than what I was doing earlier
with tokens, thanks DuyHai for suggestion.

-Priyanka

On Wed, Feb 3, 2016 at 7:41 PM, Jack Krupansky 
wrote:

> CL=ALL has no benefit is RF=1.
>
> Your code snippet doesn't indicate how you initialize and update the token
> in the query. The ">" operator would assure that you skip the first token.
>
> -- Jack Krupansky
>
> On Wed, Feb 3, 2016 at 1:36 AM, Priyanka Gugale  wrote:
>
>> Hi,
>>
>> I am using Cassandra 2.2.0 and cassandra driver 2.1.8. I am trying to
>> scan a table as per suggestions given here
>> ,
>>  On running the code to fetch records from table, it fetches different
>> number of records on each run. Some times it reads all records from table,
>>  and some times some records are missing. As I have observed there is no
>> fixed pattern for missing records.
>>
>> I have tried to set consistency level to ALL while running select query
>> still I couldn't fetch all records. Is there any known issue? Or am I
>> suppose to do anything more than running simple "select" statement.
>>
>> Code snippet to fetch data:
>>
>>  SimpleStatement stmt = new SimpleStatement(query);
>>  stmt.setConsistencyLevel(ConsistencyLevel.ALL);
>>  ResultSet result = session.execute(stmt);
>>  if (!result.isExhausted()) {
>>for (Row row : result) {
>>  process(row);
>>}
>>  }
>>
>> Query is of the form: select * from %t where token(%p) > %s limit %l;
>>
>> where t=tablename, %p=primary key, %s=token value of primary key and
>> l=limit
>>
>> I am testing on my local machine and has created a Keyspace with
>> replication factor of 1. Also I don't see any errors in the logs.
>>
>> -Priyanka
>>
>
>


Re: Re : Possibility of using 2 different snitches in the Multi_DC cluster

2016-02-03 Thread sai krishnam raju potturi
thanks a lot Robert. Greatly appreciate it.

thanks
Sai

On Tue, Feb 2, 2016 at 6:19 PM, Robert Coli  wrote:

> On Tue, Feb 2, 2016 at 1:23 PM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> What is the possibility of using GossipingPropertFileSnitch on
>> datacenters in our private cloud, and Ec2MultiRegionSnitch in AWS?
>>
>
> You should just use GPFS everywhere.
>
> This is also the reason why you should not use EC2MRS if you might ever
> have a DC that is outside of AWS. Just use GPFS.
>
> =Rob
> PS - To answer your actual question... one "can" use different snitches on
> a per node basis, but ONE REALLY REALLY SHOULDN'T CONSIDER THIS A VALID
> APPROACH AND IF ONE TRIES AND FAILS I WILL POINT AND LAUGH AND NOT HELP
> THEM :D
>


Any tips on how to track down why Cassandra won't cluster?

2016-02-03 Thread Richard L. Burton III
I'm deploying 2 nodes at the moment using cassandra-dse on Amazon. I
configured it to use EC2Snitch and configured rackdc to use us-east with
rack "1".

The second node points to the first node as the seed e.g., "seeds":
["54.*.*.*"] and all of the ports are open.

Any suggestions on how to track down what might trigger this problem? I'm
not receiving any exceptions.

-- 
-Richard L. Burton III
@rburton


Re: Re : Possibility of using 2 different snitches in the Multi_DC cluster

2016-02-03 Thread Ben Bromhead
Also you may want to run multiple data centres in the one AWS region (load
segmentation, spark etc). +1 GPFS for everything

On Wed, 3 Feb 2016 at 07:42 sai krishnam raju potturi 
wrote:

> thanks a lot Robert. Greatly appreciate it.
>
> thanks
> Sai
>
> On Tue, Feb 2, 2016 at 6:19 PM, Robert Coli  wrote:
>
>> On Tue, Feb 2, 2016 at 1:23 PM, sai krishnam raju potturi <
>> pskraj...@gmail.com> wrote:
>>
>>> What is the possibility of using GossipingPropertFileSnitch on
>>> datacenters in our private cloud, and Ec2MultiRegionSnitch in AWS?
>>>
>>
>> You should just use GPFS everywhere.
>>
>> This is also the reason why you should not use EC2MRS if you might ever
>> have a DC that is outside of AWS. Just use GPFS.
>>
>> =Rob
>> PS - To answer your actual question... one "can" use different snitches
>> on a per node basis, but ONE REALLY REALLY SHOULDN'T CONSIDER THIS A VALID
>> APPROACH AND IF ONE TRIES AND FAILS I WILL POINT AND LAUGH AND NOT HELP
>> THEM :D
>>
>
> --
Ben Bromhead
CTO | Instaclustr
+1 650 284 9692


Re: Any tips on how to track down why Cassandra won't cluster?

2016-02-03 Thread Ben Bromhead
Check network connectivity. If you are using public addresses as the
broadcast, make sure you can telnet from one node to the other nodes public
address using the internode port.

Last time I looked into something like this, for some reason if you only
add a security group id to the allowed traffic in a security group you
still need to add public IP addresses for each node in a security groups
allowed inbound traffic as well.

On Wed, 3 Feb 2016 at 11:49 Richard L. Burton III 
wrote:

> I'm deploying 2 nodes at the moment using cassandra-dse on Amazon. I
> configured it to use EC2Snitch and configured rackdc to use us-east with
> rack "1".
>
> The second node points to the first node as the seed e.g., "seeds":
> ["54.*.*.*"] and all of the ports are open.
>
> Any suggestions on how to track down what might trigger this problem? I'm
> not receiving any exceptions.
>
>
> --
> -Richard L. Burton III
> @rburton
>
-- 
Ben Bromhead
CTO | Instaclustr
+1 650 284 9692


Re: EC2 storage options for C*

2016-02-03 Thread Ben Bromhead
For what it's worth we've tried d2 instances and they encourage terrible
things like super dense nodes (increases your replacement time). In terms
of useable storage I would go with gp2 EBS on a m4 based instance.

On Mon, 1 Feb 2016 at 14:25 Jack Krupansky  wrote:

> Ah, yes, the good old days of m1.large.
>
> -- Jack Krupansky
>
> On Mon, Feb 1, 2016 at 5:12 PM, Jeff Jirsa 
> wrote:
>
>> A lot of people use the old gen instances (m1 in particular) because they
>> came with a ton of effectively free ephemeral storage (up to 1.6TB).
>> Whether or not they’re viable is a decision for each user to make. They’re
>> very, very commonly used for C*, though. At a time when EBS was not
>> sufficiently robust or reliable, a cluster of m1 instances was the de facto
>> standard.
>>
>> The canonical “best practice” in 2015 was i2. We believe we’ve made a
>> compelling argument to use m4 or c4 instead of i2. There exists a company
>> we know currently testing d2 at scale, though I’m not sure they have much
>> in terms of concrete results at this time.
>>
>> - Jeff
>>
>> From: Jack Krupansky
>> Reply-To: "user@cassandra.apache.org"
>> Date: Monday, February 1, 2016 at 1:55 PM
>>
>> To: "user@cassandra.apache.org"
>> Subject: Re: EC2 storage options for C*
>>
>> Thanks. My typo - I referenced "C2 Dense Storage" which is really "D2
>> Dense Storage".
>>
>> The remaining question is whether any of the "Previous Generation
>> Instances" should be publicly recommended going forward.
>>
>> And whether non-SSD instances should be recommended going forward as
>> well. sure, technically, someone could use the legacy instances, but the
>> question is what we should be recommending as best practice going forward.
>>
>> Yeah, the i2 instances look like the sweet spot for any non-EBS clusters.
>>
>> -- Jack Krupansky
>>
>> On Mon, Feb 1, 2016 at 4:30 PM, Steve Robenalt 
>> wrote:
>>
>>> Hi Jack,
>>>
>>> At the bottom of the instance-types page, there is a link to the
>>> previous generations, which includes the older series (m1, m2, etc), many
>>> of which have HDD options.
>>>
>>> There are also the d2 (Dense Storage) instances in the current
>>> generation that include various combos of local HDDs.
>>>
>>> The i2 series has good sized SSDs available, and has the advanced
>>> networking option, which is also useful for Cassandra. The enhanced
>>> networking is available with other instance types as well, as you'll see on
>>> the feature list under each type.
>>>
>>> Steve
>>>
>>>
>>>
>>> On Mon, Feb 1, 2016 at 1:17 PM, Jack Krupansky >> > wrote:
>>>
 Thanks. Reading a little bit on AWS, and back to my SSD vs. magnetic
 question, it seems like magnetic (HDD) is no longer a recommended storage
 option for databases on AWS. In particular, only the C2 Dense Storage
 instances have local magnetic storage - all the other instance types are
 SSD or EBS-only - and EBS Magnetic is only recommended for "Infrequent Data
 Access."

 For the record, that AWS doc has Cassandra listed as a use case for i2
 instance types.

 Also, the AWS doc lists EBS io2 for the NoSQL database use case and gp2
 only for the "small to medium databases" use case.

 Do older instances with local HDD still exist on AWS (m1, m2, etc.)? Is
 the doc simply for any newly started instances?

 See:
 https://aws.amazon.com/ec2/instance-types/
 http://aws.amazon.com/ebs/details/


 -- Jack Krupansky

 On Mon, Feb 1, 2016 at 2:09 PM, Jeff Jirsa 
 wrote:

> > My apologies if my questions are actually answered on the video or
> slides, I just did a quick scan of the slide text.
>
> Virtually all of them are covered.
>
> > I'm curious where the EBS physical devices actually reside - are
> they in the same rack, the same data center, same availability zone? I
> mean, people try to minimize network latency between nodes, so how exactly
> is EBS able to avoid network latency?
>
> Not published,and probably not a straight forward answer (probably
> have redundancy cross-az, if it matches some of their other published
> behaviors). The promise they give you is ‘iops’, with a certain block 
> size.
> Some instance types are optimized for dedicated, ebs-only network
> interfaces. Like most things in cassandra / cloud, the only way to know 
> for
> sure is to test it yourself and see if observed latency is acceptable (or
> trust our testing, if you assume we’re sufficiently smart and honest).
>
> > Did your test use Amazon EBS–Optimized Instances?
>
> We tested dozens of instance type/size combinations (literally). The
> best performance was clearly with ebs-optimized instances that also have
> enhanced networking (c4, m4, etc) - slide 43
>
> >