RE: Looking for feedback on automated root-cause system

2019-03-05 Thread Kenneth Brotman
I found their YouTube video, Machine Learning & The future of DevOps – An Intro 
to Vorstella: https://www.youtube.com/watch?v=YZ5_LAXvUUo

 

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Tuesday, March 05, 2019 11:50 AM
To: user@cassandra.apache.org
Subject: RE: Looking for feedback on automated root-cause system

 

You are the real deal. I know you’ve been a top notch person in the community 
for a long time.  Glad to hear that this is coming.  It’s very exciting!

 

From: Matthew Stump [mailto:mst...@vorstella.com] 
Sent: Tuesday, March 05, 2019 11:47 AM
To: user@cassandra.apache.org
Subject: Re: Looking for feedback on automated root-cause system

 

We probably will, that'll come soon-ish (a couple of weeks perhaps). Right now 
we're limited by who we can engage with in order to collect feedback.

 

On Tue, Mar 5, 2019 at 11:34 AM Kenneth Brotman  
wrote:

Simulators will never get you there.  Why don’t you let everyone plug in to the 
NOC in exchange for standard features or limited scale, make some money on the 
big cats that can you can make value proposition attractive for anyway.  You 
get the data you have to have – and free; everyone’s Cassandra cluster get’s 
smart!

 

 

From: Matthew Stump [mailto:mst...@vorstella.com] 
Sent: Tuesday, March 05, 2019 11:12 AM
To: user@cassandra.apache.org
Subject: Re: Looking for feedback on automated root-cause system

 

Getting people to send data to us can be a little bit of a PITA, but it's 
doable. We've got data from regulated/secure environments streaming in. None of 
the data we collect is a risk, but the default is to say no and you've got to 
overcome that barrier. We've been through the audit a bunch of times, it gets 
easier each time because everyone asks more or less the same questions and 
requires the same set of disclosures.

 

Cold start for AI is always an issue but we overcame it via two routes:

 

We had customers from a pre-existing line of business. We were probably the 
first ones to run production Cassandra workloads at scale in k8s. We funded the 
work behind the some of the initial blog posts and had to figure out most of 
the ins-and-outs of making it work. This data is good for helping to identify 
edge cases and bugs that you wouldn't normally encounter, but it's super noisy 
and you've got to do a lot to isolate and/or derive value from data in the 
beginning if you're attempting to do root cause.

 

Leveraging the above we built out an extensive simulations pipeline. It 
initially started as python scripts targeting k8s, but it's since been fully 
automated with Spinnaker.  We have a couple of simulations running all the time 
doing continuous integration with the models, collectors and pipeline code, but 
will burst out to a couple hundred clusters if we need to test something 
complicated. It's takes just a couple of minutes to have it spin up hundreds of 
different load generators, targeting different versions of C*, running with 
different topologies, using clean disks or restoring from previous snapshots.

 

As the corpus grows simulations mater less, and it's easier to get signal from 
noise in a customer cluster.

 

On Tue, Mar 5, 2019 at 10:15 AM Kenneth Brotman  
wrote:

Matt,

 

Do you anticipate having trouble getting clients to allow the collector to send 
data up to your NOC?  Wouldn’t a lot of companies be unable or uneasy about 
that?

 

Your ML can only work if it’s got LOTS of data from many different scenarios.  
How are you addressing that?  How are you able to get that much good quality 
data?

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Tuesday, March 05, 2019 10:01 AM
To: 'user@cassandra.apache.org'
Subject: RE: Looking for feedback on automated root-cause system

 

I see they have a website now at https://vorstella.com/

 

 

From: Matt Stump [mailto:mrevilgn...@gmail.com] 
Sent: Friday, February 22, 2019 7:56 AM
To: user
Subject: Re: Looking for feedback on automated root-cause system

 

For some reason responses to the thread didn't hit my work email, I didn't see 
the responses until I check from my personal. 

 

The way that the system works is that we install a collector that pulls a bunch 
of metrics from each node and sends it up to our NOC every minute. We've got a 
bunch of stream processors that take this data and do a bunch of things with 
it. We've got some dumb ones that check for common miss-configurations, bugs 
etc.. they also populate dashboards and a couple of minimal graphs. The more 
intelligent agents take a look at the metrics and they start generating a bunch 
of calculated/scaled metrics and events. If one of these triggers a threshold 
then we kick off the ML that does classification using the stored data to 
classify the root cause, and point you to the correct knowledge base article 
with remediation steps. Because we've got he cluster history we can identify a 
breach, and give you an SLA in about 1 

Re: data modelling

2019-03-05 Thread Stefan Miklosovic
Hi Bobbie,

as Kenneth already mentioned, you should model your schema based on what
queries you are expecting to do and read related literature. From what I
see your table is named "customer_sensor_tagids" so its quite possible you
would have tagids as a part of primary key? Something like:

select * from keyspace.customer_sensor_tagids where tag_id = 11358097.

This implies that you would have as many records per customer and sensor
ids as many tag_id's there are. If you want to query such table and you
know customerid and sensorid in advance, you could query like

select * from keyspace.customer_sensor_tagids where customerid = X and
sensorid =Y and tag_id = 11358097

so your primary key would look like (customerid, sensorid, tagid) or
((customerid, sensorid), tagid)

If you do not know customerid nor sensorid while doing a query, you would
have to make tag_id a partition key and customerid and sensorid clustering
columns, optionally ordered, thats up to you. Now you may object that there
would be data duplication as you would have to have "as many tables as
queries" which might be true but thats not in general a problem. Thats the
cost you "pay" for having queries super fast and tailored for your use case.

I suggest to read more about data modelling in general.

On Wed, 6 Mar 2019 at 11:19, Bobbie Haynes  wrote:

> Hi
>Could you help  modelling this usecase
>
>I have below table ..I will update tagid's columns set(bigit) based on
> PK. I have created the secondary index column on tagid to query like below..
>
> Select * from keyspace.customer_sensor_tagids where tagids CONTAINS
> 11358097;
>
> this query is doing the range scan because of the secondary index.. and
> causing performance issues
>
> If i create a MV on Tagid's can i be able to query like above.. please
> suggest a Datamodel for this scenario.Apprecite your help on this.
>
> ---
>
> ---
> example of Tagids for each row:-
>4608831, 608886, 608890, 609164, 615024, 679579, 814791, 830404, 71756,
> 8538307, 9936868, 10883336, 10954034, 10958062, 10976553, 10976554,
> 10980255, 11009971, 11043805, 11075379, 11078819, 11167844, 11358097,
> 11479340, 11481769, 11481770, 11481771, 11481772, 11693597, 11709012,
> 12193230, 12421500, 12421516, 12421781, 12422011, 12422368, 12422501,
> 12422512, 12422553, 12422555, 12423381, 12423382
>
>
>  
> ---
>
> ---
>
>CREATE TABLE keyspace.customer_sensor_tagids (
> customerid bigint,
> sensorid bigint,
> XXX frozen,
> XXX frozen,
> XXX text,
> XXX text,
> XXX frozen,
> XXX bigint,
> XXX bigint,
> XXX list>,
> XXX frozen,
> XXX boolean,
> XXX bigint,
> XXX list>,
> XXX frozen,
> XXX bigint,
> XXX bigint,
> XXX list>,
> XXX list>,
> XXX set>,
> XXX set,
> XXX set,
> tagids set,
> XXX bigint,
> XXX list>,
> PRIMARY KEY ((customerid, sensorid))
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(tagids));
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));
> CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
>


-- 


*Stefan Miklosovic**Senior Software Engineer*


M: +61459911436



   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and 

RE: data modelling

2019-03-05 Thread Kenneth Brotman
You definitely don’t need a secondary index.  A MV might be the answer.  

 

How many tagids does a sensor have ?

Do you have to use a collection for tagids?

How many sensors would you expect to have a particular tagid?

Would you know the customerid and sensorid and be able to specify that in the 
query?

 

If you could have tagid not be a collection, and make it part of the primary 
key, that would help a lot.

  

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Tuesday, March 05, 2019 4:33 PM
To: user@cassandra.apache.org
Subject: RE: data modelling

 

Hi Bobbie,

 

You’re not giving enough information to model the data.  With Cassandra it’s 
based on the queries you are going to need.  This link to Jeffrey Carpenter’s 
book, Cassandra the Definitive Guide, Chapter 5, which is on how to do data 
modeling for Cassandra, should be of help to you: 
https://books.google.com/books?id=uW-PDAAAQBAJ 

 
=PA79=PA79=jeff+carpenter+chapter+5=bl=58cM-BII2M=ACfU3U0-188Fw-jcj1tbMItdPlNH8Lk9yQ=en=X=2ahUKEwinrY3OoezgAhWoHDQIHRfmA7IQ6AEwA3oECAcQAQ#v=onepage=jeff%20carpenter%20chapter%205=false

 

 

 

From: Bobbie Haynes [mailto:haynes30...@gmail.com] 
Sent: Tuesday, March 05, 2019 4:19 PM
To: user@cassandra.apache.org
Subject: data modelling

 

Hi 

   Could you help  modelling this usecase 

 

   I have below table ..I will update tagid's columns set(bigit) based on PK. I 
have created the secondary index column on tagid to query like below..

 

Select * from keyspace.customer_sensor_tagids where tagids CONTAINS 11358097;

 

this query is doing the range scan because of the secondary index.. and causing 
performance issues 

 

If i create a MV on Tagid's can i be able to query like above.. please suggest 
a Datamodel for this scenario.Apprecite your help on this.

---

---

example of Tagids for each row:-

   4608831, 608886, 608890, 609164, 615024, 679579, 814791, 830404, 71756, 
8538307, 9936868, 10883336, 10954034, 10958062, 10976553, 10976554, 10980255, 
11009971, 11043805, 11075379, 11078819, 11167844, 11358097, 11479340, 11481769, 
11481770, 11481771, 11481772, 11693597, 11709012, 12193230, 12421500, 12421516, 
12421781, 12422011, 12422368, 12422501, 12422512, 12422553, 12422555, 12423381, 
12423382

 

   
---

---
 

 

   CREATE TABLE keyspace.customer_sensor_tagids (

customerid bigint,

sensorid bigint,

XXX frozen,

XXX frozen,

XXX text,

XXX text,

XXX frozen,

XXX bigint,

XXX bigint,

XXX list>,

XXX frozen,

XXX boolean,

XXX bigint,

XXX list>,

XXX frozen,

XXX bigint,

XXX bigint,

XXX list>,

XXX list>,

XXX set>,

XXX set,

XXX set,

tagids set,

XXX bigint,

XXX list>,

PRIMARY KEY ((customerid, sensorid))

) WITH bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(tagids));

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);



RE: data modelling

2019-03-05 Thread Kenneth Brotman
Hi Bobbie,

 

You’re not giving enough information to model the data.  With Cassandra it’s 
based on the queries you are going to need.  This link to Jeffrey Carpenter’s 
book, Cassandra the Definitive Guide, Chapter 5, which is on how to do data 
modeling for Cassandra, should be of help to you: 
https://books.google.com/books?id=uW-PDAAAQBAJ 

 
=PA79=PA79=jeff+carpenter+chapter+5=bl=58cM-BII2M=ACfU3U0-188Fw-jcj1tbMItdPlNH8Lk9yQ=en=X=2ahUKEwinrY3OoezgAhWoHDQIHRfmA7IQ6AEwA3oECAcQAQ#v=onepage=jeff%20carpenter%20chapter%205=false

 

 

 

From: Bobbie Haynes [mailto:haynes30...@gmail.com] 
Sent: Tuesday, March 05, 2019 4:19 PM
To: user@cassandra.apache.org
Subject: data modelling

 

Hi 

   Could you help  modelling this usecase 

 

   I have below table ..I will update tagid's columns set(bigit) based on PK. I 
have created the secondary index column on tagid to query like below..

 

Select * from keyspace.customer_sensor_tagids where tagids CONTAINS 11358097;

 

this query is doing the range scan because of the secondary index.. and causing 
performance issues 

 

If i create a MV on Tagid's can i be able to query like above.. please suggest 
a Datamodel for this scenario.Apprecite your help on this.

---

---

example of Tagids for each row:-

   4608831, 608886, 608890, 609164, 615024, 679579, 814791, 830404, 71756, 
8538307, 9936868, 10883336, 10954034, 10958062, 10976553, 10976554, 10980255, 
11009971, 11043805, 11075379, 11078819, 11167844, 11358097, 11479340, 11481769, 
11481770, 11481771, 11481772, 11693597, 11709012, 12193230, 12421500, 12421516, 
12421781, 12422011, 12422368, 12422501, 12422512, 12422553, 12422555, 12423381, 
12423382

 

   
---

---
 

 

   CREATE TABLE keyspace.customer_sensor_tagids (

customerid bigint,

sensorid bigint,

XXX frozen,

XXX frozen,

XXX text,

XXX text,

XXX frozen,

XXX bigint,

XXX bigint,

XXX list>,

XXX frozen,

XXX boolean,

XXX bigint,

XXX list>,

XXX frozen,

XXX bigint,

XXX bigint,

XXX list>,

XXX list>,

XXX set>,

XXX set,

XXX set,

tagids set,

XXX bigint,

XXX list>,

PRIMARY KEY ((customerid, sensorid))

) WITH bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(tagids));

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);



data modelling

2019-03-05 Thread Bobbie Haynes
Hi
   Could you help  modelling this usecase

   I have below table ..I will update tagid's columns set(bigit) based on
PK. I have created the secondary index column on tagid to query like below..

Select * from keyspace.customer_sensor_tagids where tagids CONTAINS
11358097;

this query is doing the range scan because of the secondary index.. and
causing performance issues

If i create a MV on Tagid's can i be able to query like above.. please
suggest a Datamodel for this scenario.Apprecite your help on this.
---
---
example of Tagids for each row:-
   4608831, 608886, 608890, 609164, 615024, 679579, 814791, 830404, 71756,
8538307, 9936868, 10883336, 10954034, 10958062, 10976553, 10976554,
10980255, 11009971, 11043805, 11075379, 11078819, 11167844, 11358097,
11479340, 11481769, 11481770, 11481771, 11481772, 11693597, 11709012,
12193230, 12421500, 12421516, 12421781, 12422011, 12422368, 12422501,
12422512, 12422553, 12422555, 12423381, 12423382


 
---
---

   CREATE TABLE keyspace.customer_sensor_tagids (
customerid bigint,
sensorid bigint,
XXX frozen,
XXX frozen,
XXX text,
XXX text,
XXX frozen,
XXX bigint,
XXX bigint,
XXX list>,
XXX frozen,
XXX boolean,
XXX bigint,
XXX list>,
XXX frozen,
XXX bigint,
XXX bigint,
XXX list>,
XXX list>,
XXX set>,
XXX set,
XXX set,
tagids set,
XXX bigint,
XXX list>,
PRIMARY KEY ((customerid, sensorid))
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(tagids));
CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));
CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);
CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));
CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);


RE: AxonOps - Cassandra operational management tool

2019-03-05 Thread Kenneth Brotman
Hayato,

 

I agree with what you are addressing as I’ve always thought the big elephant in 
the room regarding Cassandra was that you had to use all these other tools, 
each of which requires updating, configuring changes, and that too much 
attention had to be paid to all those other tools instead of what your trying 
to accomplish; when instead if addressed it all could be centralized, 
internalized, or something but clearly it was quite doable.  

 

Questions regarding where things are at:

 

Are you using AxonOps in any of your clients Apache Cassandra production 
clusters?

 

What is the largest Cassandra cluster in which you use it?

 

Would you recommend NOT using AxonOps on production clusters for now or do you 
consider it safe to do so?

 

What is the largest Cassandra cluster you would recommend using AxonOps on?

 

Can it handle multi-cloud clusters?

 

Which clouds does it play nice with?

 

Is it good for use for on-prem nodes (or cloud only)?

 

Which versions of Cassandra does it play nice with?

 

Any rough idea when a download will be available?

 

Your blog post at https://digitalis.io/blog/apache-cassandra-management-tool/ 
provides a lot of answers already!  Really very promising!

 

Thanks,

 

Kenneth Brotman

 

 

 

From: AxonOps [mailto:axon...@digitalis.io] 
Sent: Sunday, March 03, 2019 7:51 AM
To: user@cassandra.apache.org
Subject: Re: AxonOps - Cassandra operational management tool

 

Hi Kenneth,

 

Thanks for your great feedback! We're not trying to be secretive, but just not 
amazing at promoting ourselves!

 

AxonOps was built by digitalis.io (https://digitalis.io), a company based in 
the UK providing consulting and managed services for Cassandra, Kafka and 
Spark. digitalis.io was founded 3 years ago by 2 ex-DataStax architects but 
their experience of Cassandra predates the tenure at DataStax.

 

We have been looking after a lot of Cassandra clusters for our customers, but 
found ourselves spending more time maintaining monitoring and operational tools 
than Cassandra clusters themselves. The motivation was to build a management 
platform to make our lives easier. You can read my blog here - 
https://digitalis.io/blog/apache-cassandra-management-tool/

 

We have not yet created any videos but that's in our backlog so people can see 
AxonOps in action. No testimonials yet either since the customer of the product 
has been ourselves, and only just released it to the public as beta few weeks 
ago. We've decided to share it for free to anybody using up to 6 nodes, as we 
see a lot of clusters out there within this range.

 

The only investment would be a minimum amount of your time to install it. We 
have made the installation process as easy as possible. Hopefully you will find 
it immensely quicker and easier than installing and configuring ELK, 
Prometheus, Grafana, Nagios, custom backups and repair scheduling. It has 
certainly made our lives easier for sure.

 

We are fully aware of the new features going into 4.0 and beyond. As mentioned 
earlier, we built this for ourselves - a product that does everything we want 
in one solution providing a single pane of glass. It's free and we're sharing 
this with you.

 

Enjoy!

 

Hayato Shimizu

 

 

On Sun, 3 Mar 2019 at 06:05, Kenneth Brotman  
wrote:

Sorry, Nitan was only making a comment about this post but the comments I’m 
making are to AxonOps.  

 

It appears we don’t have a name for anyone at AxonOps at all then!  You guys 
are going to need to be more open.

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Saturday, March 02, 2019 10:02 PM
To: user@cassandra.apache.org
Subject: RE: AxonOps - Cassandra operational management tool

 

Nitan,

 

A few thoughts:


Isn’t it a lot to expect folks to download, install and evaluate the product 
considering,

· You aren’t being very clear about who you are,

· You don’t have any videos demonstrating the product,

· You don’t provide any testimonials, 

· You have no case studies with repeatable results, ROI, etc.  All the 
normal stuff.

· What about testing?  No one knows how well tested it is.  Why would 
we download it?

 

Don’t forget that the open source Cassandra community is already addressing 
ways in which Cassandra itself will be able to do several of the things that 
you listed.  

 

How much added value are you providing with this product?  It’s up to you to 
make the case.  You’ll have to spend more time on the business side of things 
if you want to do any business.

 

Kenneth Brotman

 

From: AxonOps [mailto:axon...@digitalis.io] 
Sent: Saturday, March 02, 2019 3:47 AM
To: user@cassandra.apache.org
Subject: Re: AxonOps - Cassandra operational management tool

 

It's not an open source product but free up to 6 nodes for now. We're actively 
adding more features to it but it currently supports the following features:

 

- Metrics collection and dashboards

- Logs / events 

Re: Looking for feedback on automated root-cause system

2019-03-05 Thread Matthew Stump
We probably will, that'll come soon-ish (a couple of weeks perhaps). Right
now we're limited by who we can engage with in order to collect feedback.

On Tue, Mar 5, 2019 at 11:34 AM Kenneth Brotman
 wrote:

> Simulators will never get you there.  Why don’t you let everyone plug in
> to the NOC in exchange for standard features or limited scale, make some
> money on the big cats that can you can make value proposition attractive
> for anyway.  You get the data you have to have – and free; everyone’s
> Cassandra cluster get’s smart!
>
>
>
>
>
> *From:* Matthew Stump [mailto:mst...@vorstella.com]
> *Sent:* Tuesday, March 05, 2019 11:12 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Looking for feedback on automated root-cause system
>
>
>
> Getting people to send data to us can be a little bit of a PITA, but it's
> doable. We've got data from regulated/secure environments streaming in.
> None of the data we collect is a risk, but the default is to say no and
> you've got to overcome that barrier. We've been through the audit a bunch
> of times, it gets easier each time because everyone asks more or less the
> same questions and requires the same set of disclosures.
>
>
>
> Cold start for AI is always an issue but we overcame it via two routes:
>
>
>
> We had customers from a pre-existing line of business. We were probably
> the first ones to run production Cassandra workloads at scale in k8s. We
> funded the work behind the some of the initial blog posts and had to figure
> out most of the ins-and-outs of making it work. This data is good for
> helping to identify edge cases and bugs that you wouldn't normally
> encounter, but it's super noisy and you've got to do a lot to isolate
> and/or derive value from data in the beginning if you're attempting to do
> root cause.
>
>
>
> Leveraging the above we built out an extensive simulations pipeline. It
> initially started as python scripts targeting k8s, but it's since been
> fully automated with Spinnaker.  We have a couple of simulations running
> all the time doing continuous integration with the models, collectors and
> pipeline code, but will burst out to a couple hundred clusters if we need
> to test something complicated. It's takes just a couple of minutes to have
> it spin up hundreds of different load generators, targeting different
> versions of C*, running with different topologies, using clean disks or
> restoring from previous snapshots.
>
>
>
> As the corpus grows simulations mater less, and it's easier to get signal
> from noise in a customer cluster.
>
>
>
> On Tue, Mar 5, 2019 at 10:15 AM Kenneth Brotman
>  wrote:
>
> Matt,
>
>
>
> Do you anticipate having trouble getting clients to allow the collector to
> send data up to your NOC?  Wouldn’t a lot of companies be unable or uneasy
> about that?
>
>
>
> Your ML can only work if it’s got LOTS of data from many different
> scenarios.  How are you addressing that?  How are you able to get that much
> good quality data?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com]
> *Sent:* Tuesday, March 05, 2019 10:01 AM
> *To:* 'user@cassandra.apache.org'
> *Subject:* RE: Looking for feedback on automated root-cause system
>
>
>
> I see they have a website now at https://vorstella.com/
>
>
>
>
>
> *From:* Matt Stump [mailto:mrevilgn...@gmail.com]
> *Sent:* Friday, February 22, 2019 7:56 AM
> *To:* user
> *Subject:* Re: Looking for feedback on automated root-cause system
>
>
>
> For some reason responses to the thread didn't hit my work email, I didn't
> see the responses until I check from my personal.
>
>
>
> The way that the system works is that we install a collector that pulls a
> bunch of metrics from each node and sends it up to our NOC every minute.
> We've got a bunch of stream processors that take this data and do a bunch
> of things with it. We've got some dumb ones that check for common
> miss-configurations, bugs etc.. they also populate dashboards and a couple
> of minimal graphs. The more intelligent agents take a look at the metrics
> and they start generating a bunch of calculated/scaled metrics and events.
> If one of these triggers a threshold then we kick off the ML that does
> classification using the stored data to classify the root cause, and point
> you to the correct knowledge base article with remediation steps. Because
> we've got he cluster history we can identify a breach, and give you an SLA
> in about 1 minute. The goal is to get you from 0 to resolution as quickly
> as possible.
>
>
>
> We're looking for feedback on the existing system, do these events make
> sense, do I need to beef up a knowledge base article, did it classify
> correctly, or is there some big bug that everyone is running into that
> needs to be publicized. We're also looking for where to go next, which
> models are going to make your life easier?
>
>
>
> The system works for C*, Elastic and Kafka. We'll be doing some blog posts
> explaining in more detail how it works 

Re: Looking for feedback on automated root-cause system

2019-03-05 Thread Matthew Stump
Getting people to send data to us can be a little bit of a PITA, but it's
doable. We've got data from regulated/secure environments streaming in.
None of the data we collect is a risk, but the default is to say no and
you've got to overcome that barrier. We've been through the audit a bunch
of times, it gets easier each time because everyone asks more or less the
same questions and requires the same set of disclosures.

Cold start for AI is always an issue but we overcame it via two routes:

We had customers from a pre-existing line of business. We were probably the
first ones to run production Cassandra workloads at scale in k8s. We funded
the work behind the some of the initial blog posts and had to figure out
most of the ins-and-outs of making it work. This data is good for helping
to identify edge cases and bugs that you wouldn't normally encounter, but
it's super noisy and you've got to do a lot to isolate and/or derive value
from data in the beginning if you're attempting to do root cause.

Leveraging the above we built out an extensive simulations pipeline. It
initially started as python scripts targeting k8s, but it's since been
fully automated with Spinnaker.  We have a couple of simulations running
all the time doing continuous integration with the models, collectors and
pipeline code, but will burst out to a couple hundred clusters if we need
to test something complicated. It's takes just a couple of minutes to have
it spin up hundreds of different load generators, targeting different
versions of C*, running with different topologies, using clean disks or
restoring from previous snapshots.

As the corpus grows simulations mater less, and it's easier to get signal
from noise in a customer cluster.

On Tue, Mar 5, 2019 at 10:15 AM Kenneth Brotman
 wrote:

> Matt,
>
>
>
> Do you anticipate having trouble getting clients to allow the collector to
> send data up to your NOC?  Wouldn’t a lot of companies be unable or uneasy
> about that?
>
>
>
> Your ML can only work if it’s got LOTS of data from many different
> scenarios.  How are you addressing that?  How are you able to get that much
> good quality data?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Kenneth Brotman [mailto:kenbrot...@yahoo.com]
> *Sent:* Tuesday, March 05, 2019 10:01 AM
> *To:* 'user@cassandra.apache.org'
> *Subject:* RE: Looking for feedback on automated root-cause system
>
>
>
> I see they have a website now at https://vorstella.com/
>
>
>
>
>
> *From:* Matt Stump [mailto:mrevilgn...@gmail.com]
> *Sent:* Friday, February 22, 2019 7:56 AM
> *To:* user
> *Subject:* Re: Looking for feedback on automated root-cause system
>
>
>
> For some reason responses to the thread didn't hit my work email, I didn't
> see the responses until I check from my personal.
>
>
>
> The way that the system works is that we install a collector that pulls a
> bunch of metrics from each node and sends it up to our NOC every minute.
> We've got a bunch of stream processors that take this data and do a bunch
> of things with it. We've got some dumb ones that check for common
> miss-configurations, bugs etc.. they also populate dashboards and a couple
> of minimal graphs. The more intelligent agents take a look at the metrics
> and they start generating a bunch of calculated/scaled metrics and events.
> If one of these triggers a threshold then we kick off the ML that does
> classification using the stored data to classify the root cause, and point
> you to the correct knowledge base article with remediation steps. Because
> we've got he cluster history we can identify a breach, and give you an SLA
> in about 1 minute. The goal is to get you from 0 to resolution as quickly
> as possible.
>
>
>
> We're looking for feedback on the existing system, do these events make
> sense, do I need to beef up a knowledge base article, did it classify
> correctly, or is there some big bug that everyone is running into that
> needs to be publicized. We're also looking for where to go next, which
> models are going to make your life easier?
>
>
>
> The system works for C*, Elastic and Kafka. We'll be doing some blog posts
> explaining in more detail how it works and some of the interesting things
> we've found. For example everything everyone thought they knew about
> Cassandra thread pool tuning is wrong, nobody really knows how to tune
> Kafka for large messages, or that there are major issues with the
> Kubernetes charts that people are using.
>
>
>
>
>
>
>
> On Tue, Feb 19, 2019 at 4:40 PM Kenneth Brotman
>  wrote:
>
> Any information you can share on the inputs it needs/uses would be helpful.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* daemeon reiydelle [mailto:daeme...@gmail.com]
> *Sent:* Tuesday, February 19, 2019 4:27 PM
> *To:* user
> *Subject:* Re: Looking for feedback on automated root-cause system
>
>
>
> Welcome to the world of testing predictive analytics. I will pass this on
> to my folks at Accenture, know of a couple of C* clients we run, wondering
> what you had in 

RE: Looking for feedback on automated root-cause system

2019-03-05 Thread Kenneth Brotman
Matt,

 

Do you anticipate having trouble getting clients to allow the collector to send 
data up to your NOC?  Wouldn’t a lot of companies be unable or uneasy about 
that?

 

Your ML can only work if it’s got LOTS of data from many different scenarios.  
How are you addressing that?  How are you able to get that much good quality 
data?

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Tuesday, March 05, 2019 10:01 AM
To: 'user@cassandra.apache.org'
Subject: RE: Looking for feedback on automated root-cause system

 

I see they have a website now at https://vorstella.com/

 

 

From: Matt Stump [mailto:mrevilgn...@gmail.com] 
Sent: Friday, February 22, 2019 7:56 AM
To: user
Subject: Re: Looking for feedback on automated root-cause system

 

For some reason responses to the thread didn't hit my work email, I didn't see 
the responses until I check from my personal. 

 

The way that the system works is that we install a collector that pulls a bunch 
of metrics from each node and sends it up to our NOC every minute. We've got a 
bunch of stream processors that take this data and do a bunch of things with 
it. We've got some dumb ones that check for common miss-configurations, bugs 
etc.. they also populate dashboards and a couple of minimal graphs. The more 
intelligent agents take a look at the metrics and they start generating a bunch 
of calculated/scaled metrics and events. If one of these triggers a threshold 
then we kick off the ML that does classification using the stored data to 
classify the root cause, and point you to the correct knowledge base article 
with remediation steps. Because we've got he cluster history we can identify a 
breach, and give you an SLA in about 1 minute. The goal is to get you from 0 to 
resolution as quickly as possible. 

 

We're looking for feedback on the existing system, do these events make sense, 
do I need to beef up a knowledge base article, did it classify correctly, or is 
there some big bug that everyone is running into that needs to be publicized. 
We're also looking for where to go next, which models are going to make your 
life easier?

 

The system works for C*, Elastic and Kafka. We'll be doing some blog posts 
explaining in more detail how it works and some of the interesting things we've 
found. For example everything everyone thought they knew about Cassandra thread 
pool tuning is wrong, nobody really knows how to tune Kafka for large messages, 
or that there are major issues with the Kubernetes charts that people are using.

 

 

 

On Tue, Feb 19, 2019 at 4:40 PM Kenneth Brotman  
wrote:

Any information you can share on the inputs it needs/uses would be helpful.

 

Kenneth Brotman

 

From: daemeon reiydelle [mailto:daeme...@gmail.com] 
Sent: Tuesday, February 19, 2019 4:27 PM
To: user
Subject: Re: Looking for feedback on automated root-cause system

 

Welcome to the world of testing predictive analytics. I will pass this on to my 
folks at Accenture, know of a couple of C* clients we run, wondering what you 
had in mind?

 

 

Daemeon C.M. Reiydelle

email: daeme...@gmail.com

San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype daemeon.c.mreiydelle

 

 

On Tue, Feb 19, 2019 at 3:35 PM Matthew Stump  wrote:

Howdy,

I’ve been engaged in the Cassandra user community for a long time, almost 8 
years, and have worked on hundreds of Cassandra deployments. One of the things 
I’ve noticed in myself and a lot of my peers that have done consulting, support 
or worked on really big deployments is that we get burnt out. We fight a lot of 
the same fires over and over again, and don’t get to work on new or interesting 
stuff Also, what we do is really hard to transfer to other people because it’s 
based on experience. 

Over the past year my team and I have been working to overcome that gap, 
creating an assistant that’s able to scale some of this knowledge. We’ve got it 
to the point where it’s able to classify known root causes for an outage or an 
SLA breach in Cassandra with an accuracy greater than 90%. It can accurately 
diagnose bugs, data-modeling issues, or misuse of certain features and when it 
does give you specific remediation steps with links to knowledge base articles. 

 

We think we’ve seeded our database with enough root causes that it’ll catch the 
vast majority of issues but there is always the possibility that we’ll run into 
something previously unknown like CASSANDRA-11170 (one of the issues our system 
found in the wild).

We’re looking for feedback and would like to know if anyone is interested in 
giving the product a trial. The process would be a collaboration, where we both 
get to learn from each other and improve how we’re doing things.

Thanks,
Matt Stump



RE: Looking for feedback on automated root-cause system

2019-03-05 Thread Kenneth Brotman
I see they have a website now at https://vorstella.com/

 

 

From: Matt Stump [mailto:mrevilgn...@gmail.com] 
Sent: Friday, February 22, 2019 7:56 AM
To: user
Subject: Re: Looking for feedback on automated root-cause system

 

For some reason responses to the thread didn't hit my work email, I didn't see 
the responses until I check from my personal. 

 

The way that the system works is that we install a collector that pulls a bunch 
of metrics from each node and sends it up to our NOC every minute. We've got a 
bunch of stream processors that take this data and do a bunch of things with 
it. We've got some dumb ones that check for common miss-configurations, bugs 
etc.. they also populate dashboards and a couple of minimal graphs. The more 
intelligent agents take a look at the metrics and they start generating a bunch 
of calculated/scaled metrics and events. If one of these triggers a threshold 
then we kick off the ML that does classification using the stored data to 
classify the root cause, and point you to the correct knowledge base article 
with remediation steps. Because we've got he cluster history we can identify a 
breach, and give you an SLA in about 1 minute. The goal is to get you from 0 to 
resolution as quickly as possible. 

 

We're looking for feedback on the existing system, do these events make sense, 
do I need to beef up a knowledge base article, did it classify correctly, or is 
there some big bug that everyone is running into that needs to be publicized. 
We're also looking for where to go next, which models are going to make your 
life easier?

 

The system works for C*, Elastic and Kafka. We'll be doing some blog posts 
explaining in more detail how it works and some of the interesting things we've 
found. For example everything everyone thought they knew about Cassandra thread 
pool tuning is wrong, nobody really knows how to tune Kafka for large messages, 
or that there are major issues with the Kubernetes charts that people are using.

 

 

 

On Tue, Feb 19, 2019 at 4:40 PM Kenneth Brotman  
wrote:

Any information you can share on the inputs it needs/uses would be helpful.

 

Kenneth Brotman

 

From: daemeon reiydelle [mailto:daeme...@gmail.com] 
Sent: Tuesday, February 19, 2019 4:27 PM
To: user
Subject: Re: Looking for feedback on automated root-cause system

 

Welcome to the world of testing predictive analytics. I will pass this on to my 
folks at Accenture, know of a couple of C* clients we run, wondering what you 
had in mind?

 

 

Daemeon C.M. Reiydelle

email: daeme...@gmail.com

San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype daemeon.c.mreiydelle

 

 

On Tue, Feb 19, 2019 at 3:35 PM Matthew Stump  wrote:

Howdy,

I’ve been engaged in the Cassandra user community for a long time, almost 8 
years, and have worked on hundreds of Cassandra deployments. One of the things 
I’ve noticed in myself and a lot of my peers that have done consulting, support 
or worked on really big deployments is that we get burnt out. We fight a lot of 
the same fires over and over again, and don’t get to work on new or interesting 
stuff Also, what we do is really hard to transfer to other people because it’s 
based on experience. 

Over the past year my team and I have been working to overcome that gap, 
creating an assistant that’s able to scale some of this knowledge. We’ve got it 
to the point where it’s able to classify known root causes for an outage or an 
SLA breach in Cassandra with an accuracy greater than 90%. It can accurately 
diagnose bugs, data-modeling issues, or misuse of certain features and when it 
does give you specific remediation steps with links to knowledge base articles. 

 

We think we’ve seeded our database with enough root causes that it’ll catch the 
vast majority of issues but there is always the possibility that we’ll run into 
something previously unknown like CASSANDRA-11170 (one of the issues our system 
found in the wild).

We’re looking for feedback and would like to know if anyone is interested in 
giving the product a trial. The process would be a collaboration, where we both 
get to learn from each other and improve how we’re doing things.

Thanks,
Matt Stump



Re: Does a co-ordinator sends a request to replica over port 9042 or 7000

2019-03-05 Thread Pranay akula
Thanks jeff,

I will look at app configuration,  that particular query will return some
where between 20k - 50k rows


Regards
Pranay

On Tue, Mar 5, 2019, 9:43 AM Jeff Jirsa  wrote:

>
>
> > On Mar 5, 2019, at 7:08 AM, Pranay akula 
> wrote:
> >
> > When a co-ordinator node request a replica node for data will it be
> requested over port 9042 or 7000
>
> 7000
>
> >
> > Recently I ran a query with allow filtering in lower environments as
> soon as I ran saw a spike in NTP active threads. I was trying to understand
> was that related to that query or not was app doing it.
>
> App side paging probably
>
> >
> > Is it like request will be sent through port 9042 but data will be sent
> over 7000 ??
>
> App asks for each page on 9042, coordinator asks replicas for it on 7000.
> How many rows were returned?
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Does a co-ordinator sends a request to replica over port 9042 or 7000

2019-03-05 Thread Jeff Jirsa



> On Mar 5, 2019, at 7:08 AM, Pranay akula  wrote:
> 
> When a co-ordinator node request a replica node for data will it be requested 
> over port 9042 or 7000

7000

> 
> Recently I ran a query with allow filtering in lower environments as soon as 
> I ran saw a spike in NTP active threads. I was trying to understand was that 
> related to that query or not was app doing it.

App side paging probably 

>   
> Is it like request will be sent through port 9042 but data will be sent over 
> 7000 ??

App asks for each page on 9042, coordinator asks replicas for it on 7000. How 
many rows were returned?


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Does a co-ordinator sends a request to replica over port 9042 or 7000

2019-03-05 Thread Pranay akula
When a co-ordinator node request a replica node for data will it be
requested over port 9042 or 7000

Recently I ran a query with allow filtering in lower environments as soon
as I ran saw a spike in NTP active threads. I was trying to understand was
that related to that query or not was app doing it.

Is it like request will be sent through port 9042 but data will be sent
over 7000 ??

Thanks


Re: Upgrade 3.11.1 to 3.11.4

2019-03-05 Thread Ioannis Zafiropoulos
Hi Kenneth,

Thanks for your interest to help. I had to take a decision quick because it
was a production cluster. So, long story short, I let the cluster finish
the decommission process before touching it. When decommissioned node left
the cluster I did a rolling restart and the nodes start behaving again
without errors, also auto-compaction resumed and all nodes had accumulated
a lot of files to compact. Then I performed a rolling upgrade from 3.11.1
to 3.11.4 which went very smoothly.

In retrospect to answer your questions:
> Was the cluster running ok before decommissioning the node?
Yes

> Why were you decommissioning the node?
Management decision, we wanted just  to shrink the cluster.

> Were you upgrading from 3.11.1 to 3.11.4?
No, that was not the initial intention. I arrived at that conclusion after
I realized I stepped into this bug on the rest of the nodes.
"Prevent compaction strategies from looping indefinitely" CASSANDRA-14079


Thanks again!


On Thu, Feb 28, 2019 at 10:45 AM Kenneth Brotman
 wrote:

> Hi John,
>
>
>
> Was the cluster running ok before decommissioning the node?
>
> Why were you decommissioning the node?
>
> Were you upgrading from 3.11.1 to 3.11.4?
>
>
>
>
>
> *From:* Ioannis Zafiropoulos [mailto:john...@gmail.com]
> *Sent:* Wednesday, February 27, 2019 7:33 AM
> *To:* user@cassandra.apache.org
> *Subject:* Upgrade 3.11.1 to 3.11.4
>
>
>
> Hi all,
>
>
>
> During a decommission on a production cluster (9 nodes) we have some
> issues on the remaining nodes regarding compaction, and I have some
> questions about that:
>
>
>
> One remaining node who has stopped compacting, due to some bug
>  in 3.11.1, *has
> received all* the streaming files from the decommission node
> (decommissioning is still in progress for the rest of the cluster). Could I
> upgrade this node to 3.11.4 and restart it?
>
>
>
> Some other nodes which *are still receiving* files appear to do very
> little to no auto-compaction from nodetool tpstats. Should I wait for
> streaming to complete or should I upgrade these nodes as well and restart
> them? What would happen if I bounce such a node? will the whole process of
> decommissioning fail?
>
>
>
> Do you recommend to eventually do a rolling upgrade to 3.11.4 or choose
> another version?
>
>
>
> Thanks in advance for your help,
>
> John Zaf
>


Re: About using Ec2MultiRegionSnitch

2019-03-05 Thread Jean Carlo
Awesome

Thank you very much

Cheers

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Tue, Mar 5, 2019 at 2:47 PM Jeff Jirsa  wrote:

>
>
>
> > On Mar 5, 2019, at 5:32 AM, Jean Carlo 
> wrote:
> >
> > Hello Jeff, thank you for the answer. But what will be the advantage of
> GossipingPropertyFileSnitch over Ec2MultiRegionSnitch exactly ? The
> possibility to name the DCs ?
>
>
> Yes
>
> And if you ever move out of aws you won’t have any problems
>
> >
> > And another question, are you telling me that
> GossipingPropertyFileSnitch works fine in AWS and there is no need of
> Ec2Snitch?
>
> Yea, just set the dc name to the region and the rack to the AZ in the
> property file
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


RE: [EXTERNAL] Re: A Question About Hints

2019-03-05 Thread Durity, Sean R
Versions 2.0 and 2.1 were generally very stable, so I can understand a 
reticence to move when there are so many other things competing for time and 
attention.

Sean Durity




From: shalom sagges 
Sent: Monday, March 04, 2019 4:21 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: A Question About Hints

Everyone really should move off of the 2.x versions just like you are doing.
Tell me about it... But since there are a lot of groups involved, these things 
take time unfortunately.

Thanks for your assistance Kenneth

On Mon, Mar 4, 2019 at 11:04 PM Kenneth Brotman 
mailto:kenbrot...@yahoo.com.invalid>> wrote:
Since you are in the process of upgrading, I’d do nothing on the settings right 
now.  But if you wanted to do something on the settings in the meantime, based 
on my read of the information available, I’d maybe double the default settings. 
The upgrade will help a lot of things as you know.

Everyone really should move off of the 2.x versions just like you are doing.

From: shalom sagges 
[mailto:shalomsag...@gmail.com]
Sent: Monday, March 04, 2019 12:34 PM
To: user@cassandra.apache.org
Subject: Re: A Question About Hints

See my comments inline.

Do the 8 nodes clusters have the problem too?
Yes

To the same extent?
It depends on the throughput, but basically the smaller clusters get low 
throughput, so the problem is naturally smaller.

Is it any cluster across multi-DC’s?
Yes

Do all the clusters use nodes with similar specs?
All nodes have similar specs within a cluster but different specs on different 
clusters.

The version of Cassandra you are on can make a difference.  What version are 
you on?
Currently I'm on various versions, 2.0.14, 2.1.15 and 3.0.12. In the process of 
upgrading to 3.11.4

Did you see Edward Capriolo’s presentation at 26:19 into the YouTube video at: 
https://www.youtube.com/watch?v=uN4FtAjYmLU
 where he briefly mentions you can get into trouble if you go to fast or two 
slow?
I guess you can say it about almost any parameter you change :)

BTW, I thought the comments at the end of the article you mentioned were really 
good.
The entire article is very good, but I wonder if it's still valid since it was 
created around 4 years ago.

Thanks!




On Mon, Mar 4, 2019 at 9:37 PM Kenneth Brotman 
mailto:kenbrotman@yahoocom.invalid>> wrote:
Makes sense  If you have time and don’t mind, could you answer the following:
Do the 8 nodes clusters have the problem too?
To the same extent?
Is it just the clusters with the large node count?
Is it any cluster across multi-DC’s?
Do all the clusters use nodes with similar specs?

The version of Cassandra you are on can make a difference.  What version are 
you on?

Did you see Edward Capriolo’s presentation at 26:19 into the YouTube video at: 
https://www.youtube.com/watch?v=uN4FtAjYmLU
 where he briefly mentions you can get into trouble if you go to fast or two 
slow?
BTW, I thought the comments at the end of the article you mentioned were really 
good.



From: shalom sagges 
[mailto:shalomsag...@gmail.com]
Sent: Monday, March 04, 2019 11:04 AM
To: user@cassandra.apache.org
Subject: Re: A Question About Hints

It varies...
Some clusters have 48 nodes, others 24 nodes and some 8 nodes.
Both settings are on default.

I’d try making a single conservative change to one or the other, measure and 
reassess.  Then do same to other setting.
That's the plan, but I thought I might first get some valuable information from 
someone in the community that has already experienced in this type of change.

Thanks!

On Mon, Mar 4, 2019 at 8:27 PM Kenneth Brotman 
mailto:kenbrot...@yahoo.com.invalid>> wrote:
It sounds like your use case might be appropriate for tuning those two settings 
some.

How many nodes are in the cluster?
Are both settings definitely on the default values currently?

I’d try making a single conservative change to one or the other, measure and 
reassess.  Then do same to other setting.

Then of course share your results with us.

From: shalom sagges 
[mailto:shalomsag...@gmail.com]
Sent: Monday, March 04, 2019 9:54 AM
To: user@cassandra.apache.org
Subject: Re: A Question About Hints

Hi Kenneth,

The concern is that in some cases, hints accumulate on nodes, and it takes a 
while until they are delivered (multi DCs).
I see that 

Re: About using Ec2MultiRegionSnitch

2019-03-05 Thread Jeff Jirsa




> On Mar 5, 2019, at 5:32 AM, Jean Carlo  wrote:
> 
> Hello Jeff, thank you for the answer. But what will be the advantage of 
> GossipingPropertyFileSnitch over Ec2MultiRegionSnitch exactly ? The 
> possibility to name the DCs ?


Yes

And if you ever move out of aws you won’t have any problems

> 
> And another question, are you telling me that GossipingPropertyFileSnitch 
> works fine in AWS and there is no need of Ec2Snitch?

Yea, just set the dc name to the region and the rack to the AZ in the property 
file


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: About using Ec2MultiRegionSnitch

2019-03-05 Thread Jean Carlo
Hello Jeff, thank you for the answer. But what will be the advantage of
GossipingPropertyFileSnitch over Ec2MultiRegionSnitch exactly ? The
possibility to name the DCs ?

And another question, are you telling me that GossipingPropertyFileSnitch
works fine in AWS and there is no need of Ec2Snitch?



Thank you ?


Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Tue, Mar 5, 2019 at 2:24 PM Jeff Jirsa  wrote:

> Ec2 multi should work fine in one region, but consider using
> GossipingPropertyFileSnitch if there’s even a chance you’ll want something
> other than AWS regions as dc names - multicloud, hybrid, analytics DCs, etc
>
>
>
> --
> Jeff Jirsa
>
>
> On Mar 5, 2019, at 5:12 AM, Jean Carlo  wrote:
>
> Hello everyone, I am preparing a cluster single dc in AWS. For the moment
> there is not need to have a Multi DC cluster but I would like to avoid a
> Snitch migration in the future. I would like to know if
> Ec2MultiRegionSnitch works also for cluster single DC. Or The migration
> Ec2Snitch to Ec2MultiRegionSnitch is inevitable
>
> Thank you
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
>


Re: About using Ec2MultiRegionSnitch

2019-03-05 Thread Jeff Jirsa
Ec2 multi should work fine in one region, but consider using 
GossipingPropertyFileSnitch if there’s even a chance you’ll want something 
other than AWS regions as dc names - multicloud, hybrid, analytics DCs, etc



-- 
Jeff Jirsa


> On Mar 5, 2019, at 5:12 AM, Jean Carlo  wrote:
> 
> Hello everyone, I am preparing a cluster single dc in AWS. For the moment 
> there is not need to have a Multi DC cluster but I would like to avoid a 
> Snitch migration in the future. I would like to know if Ec2MultiRegionSnitch 
> works also for cluster single DC. Or The migration Ec2Snitch to 
> Ec2MultiRegionSnitch is inevitable
> 
> Thank you
> 
> Jean Carlo
> 
> "The best way to predict the future is to invent it" Alan Kay


About using Ec2MultiRegionSnitch

2019-03-05 Thread Jean Carlo
Hello everyone, I am preparing a cluster single dc in AWS. For the moment
there is not need to have a Multi DC cluster but I would like to avoid a
Snitch migration in the future. I would like to know if
Ec2MultiRegionSnitch works also for cluster single DC. Or The migration
Ec2Snitch to Ec2MultiRegionSnitch is inevitable

Thank you

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay