Re: Nodetool repair multiple dc

2018-04-13 Thread Alexander Dejanovski
Hi Abdul,

Reaper has been used in production for several years now, by many companies.
I've seen it handling 100s of clusters and 1000s of nodes with a single
Reaper process.
Check the docs on cassandra-reaper.io to see which architecture matches
your cluster : http://cassandra-reaper.io/docs/usage/multi_dc/

Cheers,

On Fri, Apr 13, 2018 at 4:38 PM Rahul Singh 
wrote:

> Makes sense it takes a long time since it has to reconcile against
> replicas in all DCs. I leverage commercial tools for production clusters,
> but I’m pretty sure Reaper is the best open source option. Otherwise you’ll
> waste a lot of time trying to figure it out own your own. No need to
> reinvent the wheel.
>
> On Apr 12, 2018, 11:02 PM -0400, Abdul Patel , wrote:
>
> Hi All,
>
> I have 18 node cluster across 3 dc , if i tey to run incremental repair on
> singke node it takes forever sometome 45 to 1hr and sometime times out ..so
> i started running "nodetool repair -dc dc1" for each dc one by one ..which
> works fine ..do we have an better way to handle this?
> I am thinking abouy exploring cassandra reaper ..does anyone has used that
> in prod?
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Nodetool repair multiple dc

2018-04-13 Thread Rahul Singh
Makes sense it takes a long time since it has to reconcile against replicas in 
all DCs. I leverage commercial tools for production clusters, but I’m pretty 
sure Reaper is the best open source option. Otherwise you’ll waste a lot of 
time trying to figure it out own your own. No need to reinvent the wheel.

On Apr 12, 2018, 11:02 PM -0400, Abdul Patel , wrote:
> Hi All,
>
> I have 18 node cluster across 3 dc , if i tey to run incremental repair on 
> singke node it takes forever sometome 45 to 1hr and sometime times out ..so i 
> started running "nodetool repair -dc dc1" for each dc one by one ..which 
> works fine ..do we have an better way to handle this?
> I am thinking abouy exploring cassandra reaper ..does anyone has used that in 
> prod?


Re: Sorl/DSE Spark

2018-04-13 Thread Niclas Hedhman
On Fri, Apr 13, 2018, 18:40 Ben Bromhead  wrote:

>
> DSE is literally in the title.
>

:-D who reads the title???

Sorry...


RE: Mailing list server IPs

2018-04-13 Thread Jacques-Henri Berthemet
I checked with IT and I missed an email on the period where I got the last 
bounce. It’s not a very big deal but I’d like to have it fixed if possible.

Gmail servers are very picky on SMTP traffic and reject a lot of things.

--
Jacques-Henri Berthemet

From: Nicolas Guyomar [mailto:nicolas.guyo...@gmail.com]
Sent: Friday, April 13, 2018 3:15 PM
To: user@cassandra.apache.org
Subject: Re: Mailing list server IPs

Hi,

I receive similar messages from time to time, and I'm using Gmail ;)  I believe 
I never missed a mail on the ML and that you can safely ignore this message

On 13 April 2018 at 15:06, Jacques-Henri Berthemet 
>
 wrote:
Hi,

I’m getting bounce messages from the ML from time to time, see attached 
example. Our IT told me that they need to whitelist all IPs used by Cassandra 
ML server. Is there a way to get those IPs?

Sorry if it’s not really related to Cassandra itself but I didn’t find anything 
in 
http://untroubled.org/ezmlm/ezman/ezman5.html
 commands.

Regards,
--
Jacques-Henri Berthemet


-- Forwarded message --
From: "user-h...@cassandra.apache.org" 
>
To: Jacques-Henri Berthemet 
>
Cc:
Bcc:
Date: Fri, 6 Apr 2018 20:47:22 +
Subject: Warning from 
user@cassandra.apache.org
Hi! This is the ezmlm program. I'm managing the
user@cassandra.apache.org mailing list.


Messages to you from the user mailing list seem to
have been bouncing. I've attached a copy of the first bounce
message I received.

If this message bounces too, I will send you a probe. If the probe bounces,
I will remove your address from the user mailing list,
without further notice.


I've kept a list of which messages from the user mailing list have
bounced from your address.

Copies of these messages may be in the archive.
To retrieve a set of messages 123-145 (a maximum of 100 per request),
send a short message to:
   
>

To receive a subject and author list for the last 100 or so messages,
send a short message to:
   >

Here are the message numbers:

   60535
   60536
   60548

--- Enclosed is a copy of the bounce message I received.

Return-Path: <>
Received: (qmail 8848 invoked for bounce); 27 Mar 2018 14:22:11 -
Date: 27 Mar 2018 14:22:11 -
From: mailer-dae...@apache.org
To: 
user-return-605...@cassandra.apache.org
Subject: failure notice




-
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 
user-h...@cassandra.apache.org



Re: Mailing list server IPs

2018-04-13 Thread Nicolas Guyomar
Hi,

I receive similar messages from time to time, and I'm using Gmail ;)  I
believe I never missed a mail on the ML and that you can safely ignore this
message

On 13 April 2018 at 15:06, Jacques-Henri Berthemet <
jacques-henri.berthe...@genesys.com> wrote:

> Hi,
>
>
>
> I’m getting bounce messages from the ML from time to time, see attached
> example. Our IT told me that they need to whitelist all IPs used by
> Cassandra ML server. Is there a way to get those IPs?
>
>
>
> Sorry if it’s not really related to Cassandra itself but I didn’t find
> anything in http://untroubled.org/ezmlm/ezman/ezman5.html commands.
>
>
>
> Regards,
>
> --
>
> Jacques-Henri Berthemet
>
>
> -- Forwarded message --
> From: "user-h...@cassandra.apache.org" 
> To: Jacques-Henri Berthemet 
> Cc:
> Bcc:
> Date: Fri, 6 Apr 2018 20:47:22 +
> Subject: Warning from user@cassandra.apache.org
> Hi! This is the ezmlm program. I'm managing the
> user@cassandra.apache.org mailing list.
>
>
> Messages to you from the user mailing list seem to
> have been bouncing. I've attached a copy of the first bounce
> message I received.
>
> If this message bounces too, I will send you a probe. If the probe bounces,
> I will remove your address from the user mailing list,
> without further notice.
>
>
> I've kept a list of which messages from the user mailing list have
> bounced from your address.
>
> Copies of these messages may be in the archive.
> To retrieve a set of messages 123-145 (a maximum of 100 per request),
> send a short message to:
>
>
> To receive a subject and author list for the last 100 or so messages,
> send a short message to:
>
>
> Here are the message numbers:
>
>60535
>60536
>60548
>
> --- Enclosed is a copy of the bounce message I received.
>
> Return-Path: <>
> Received: (qmail 8848 invoked for bounce); 27 Mar 2018 14:22:11 -
> Date: 27 Mar 2018 14:22:11 -
> From: mailer-dae...@apache.org
> To: user-return-605...@cassandra.apache.org
> Subject: failure notice
>
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>


Mailing list server IPs

2018-04-13 Thread Jacques-Henri Berthemet
Hi,

I’m getting bounce messages from the ML from time to time, see attached 
example. Our IT told me that they need to whitelist all IPs used by Cassandra 
ML server. Is there a way to get those IPs?

Sorry if it’s not really related to Cassandra itself but I didn’t find anything 
in http://untroubled.org/ezmlm/ezman/ezman5.html commands.

Regards,
--
Jacques-Henri Berthemet
--- Begin Message ---
Hi! This is the ezmlm program. I'm managing the
user@cassandra.apache.org mailing list.


Messages to you from the user mailing list seem to
have been bouncing. I've attached a copy of the first bounce
message I received.

If this message bounces too, I will send you a probe. If the probe bounces,
I will remove your address from the user mailing list,
without further notice.


I've kept a list of which messages from the user mailing list have
bounced from your address.

Copies of these messages may be in the archive.
To retrieve a set of messages 123-145 (a maximum of 100 per request),
send a short message to:
   

To receive a subject and author list for the last 100 or so messages,
send a short message to:
   

Here are the message numbers:

   60535
   60536
   60548

--- Enclosed is a copy of the bounce message I received.

Return-Path: <>
Received: (qmail 8848 invoked for bounce); 27 Mar 2018 14:22:11 -
Date: 27 Mar 2018 14:22:11 -
From: mailer-dae...@apache.org
To: user-return-605...@cassandra.apache.org
Subject: failure notice


--- End Message ---

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Sorl/DSE Spark

2018-04-13 Thread Ben Bromhead
Thanks Jeff.

On Thu, Apr 12, 2018, 21:37 Jeff Jirsa  wrote:

> Pretty sure Ben meant that datastax produces DSE, not Cassandra, and since
> the questions specifically mentions DSE in the subject (implying that the
> user is going to be running either solr or spark within DSE to talk to
> cassandra), Ben’s recommendation seems quite reasonable to me.
>
>
>
> --
> Jeff Jirsa
>
>
> On Apr 12, 2018, at 6:23 PM, Niclas Hedhman  wrote:
>
> Ben,
>
> 1. I don't see anything in this thread that is DSE specific, so I think it
> belongs here.
>
> 2. Careful when you say that Datastax produces Cassandra. Cassandra is a
> product of Apache Software Foundation, and no one else. You, Ben, should be
> very well aware of this, to avoid further trademark issues between Datastax
> and ASF.
>
> Cheers
> Niclas Hedhman
> Member of ASF
>
> On Thu, Apr 12, 2018 at 9:57 PM, Ben Bromhead  wrote:
>
>> Folks this is the user list for Apache Cassandra. I would suggest
>> redirecting the question to Datastax the commercial entity that produces
>> the software.
>>
>> On Thu, Apr 12, 2018 at 9:51 AM vincent gromakowski <
>> vincent.gromakow...@gmail.com> wrote:
>>
>>> Best practise is to use a dedicated DC for analytics separated from the
>>> hot DC.
>>>
>>> Le jeu. 12 avr. 2018 à 15:45, sha p  a écrit :
>>>
 Got it.
 Thank you so for your detailed explanation.

 Regards,
 Shyam

 On Thu, 12 Apr 2018, 17:37 Evelyn Smith,  wrote:

> Cassandra tends to be used in a lot of web applications. It’s loads
> are more natural and evenly distributed. Like people logging on throughout
> the day. And people operating it tend to be latency sensitive.
>
> Spark on the other hand will try and complete it’s tasks as quickly as
> possible. This might mean bulk reading from the Cassandra at 10 times the
> usual operations load, but for only say 5 minutes every half hour (however
> long it takes to read in the data for a job and whenever that job is run).
> In this case during that 5 minutes your normal operations work (customers)
> are going to experience a lot of latency.
>
> This even happens with streaming jobs, every time spark goes to
> interact with Cassandra it does so very quickly, hammers it for reads and
> then does it’s own stuff until it needs to write things out. This might
> equate to intermittent latency spikes.
>
> In theory, you can throttle your reads and writes but I don’t know
> much about this and don’t see people actually doing it.
>
> Regards,
> Evelyn.
>
> On 12 Apr 2018, at 4:30 pm, sha p  wrote:
>
> Evelyn,
> Can you please elaborate on below
> Spark is notorious for causing latency spikes in Cassandra which is
> not great if you are are sensitive to that.
>
>
> On Thu, 12 Apr 2018, 10:46 Evelyn Smith,  wrote:
>
>> Are you building a search engine -> Solr
>> Are you building an analytics function -> Spark
>>
>> I feel they are used in significantly different use cases, what are
>> you trying to build?
>>
>> If it’s an analytics functionality that’s seperate from your
>> operations functionality I’d build it in it’s own DC. Spark is notorious
>> for causing latency spikes in Cassandra which is not great if you are are
>> sensitive to that.
>>
>> Regards,
>> Evelyn.
>>
>> On 12 Apr 2018, at 6:55 am, kooljava2 
>> wrote:
>>
>> Hello,
>>
>> We are exploring on configuring Sorl/Spark. Wanted to get input on
>> this.
>> 1) How do we decide which one to use?
>> 2) Do we run this on a DC where there is less workload?
>>
>> Any other suggestion or comments are appreciated.
>>
>> Thank you.
>>
>>
>>
> --
>> Ben Bromhead
>> CTO | Instaclustr 
>> +1 650 284 9692
>> Reliability at Scale
>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>>
>
>
>
> --
> Niclas Hedhman, Software Developer
> http://zest.apache.org - New Energy for Java
>
> --
Ben Bromhead
CTO | Instaclustr 
+1 650 284 9692
Reliability at Scale
Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer


Re: Sorl/DSE Spark

2018-04-13 Thread Ben Bromhead
On Thu, Apr 12, 2018, 21:23 Niclas Hedhman  wrote:

> Ben,
>
> 1. I don't see anything in this thread that is DSE specific, so I think it
> belongs here.
>
DSE is literally in the title.


> 2. Careful when you say that Datastax produces Cassandra. Cassandra is a
> product of Apache Software Foundation, and no one else. You, Ben, should be
> very well aware of this, to avoid further trademark issues between Datastax
> and ASF.
>
Given the context and subject, the software I was referring to is DSE.

Mind you, it would be hilarious if this email caused more trademark issues
with Datastax.



> Cheers
> Niclas Hedhman
> Member of ASF
>
> On Thu, Apr 12, 2018 at 9:57 PM, Ben Bromhead  wrote:
>
>> Folks this is the user list for Apache Cassandra. I would suggest
>> redirecting the question to Datastax the commercial entity that produces
>> the software.
>>
>> On Thu, Apr 12, 2018 at 9:51 AM vincent gromakowski <
>> vincent.gromakow...@gmail.com> wrote:
>>
>>> Best practise is to use a dedicated DC for analytics separated from the
>>> hot DC.
>>>
>>> Le jeu. 12 avr. 2018 à 15:45, sha p  a écrit :
>>>
 Got it.
 Thank you so for your detailed explanation.

 Regards,
 Shyam

 On Thu, 12 Apr 2018, 17:37 Evelyn Smith,  wrote:

> Cassandra tends to be used in a lot of web applications. It’s loads
> are more natural and evenly distributed. Like people logging on throughout
> the day. And people operating it tend to be latency sensitive.
>
> Spark on the other hand will try and complete it’s tasks as quickly as
> possible. This might mean bulk reading from the Cassandra at 10 times the
> usual operations load, but for only say 5 minutes every half hour (however
> long it takes to read in the data for a job and whenever that job is run).
> In this case during that 5 minutes your normal operations work (customers)
> are going to experience a lot of latency.
>
> This even happens with streaming jobs, every time spark goes to
> interact with Cassandra it does so very quickly, hammers it for reads and
> then does it’s own stuff until it needs to write things out. This might
> equate to intermittent latency spikes.
>
> In theory, you can throttle your reads and writes but I don’t know
> much about this and don’t see people actually doing it.
>
> Regards,
> Evelyn.
>
> On 12 Apr 2018, at 4:30 pm, sha p  wrote:
>
> Evelyn,
> Can you please elaborate on below
> Spark is notorious for causing latency spikes in Cassandra which is
> not great if you are are sensitive to that.
>
>
> On Thu, 12 Apr 2018, 10:46 Evelyn Smith,  wrote:
>
>> Are you building a search engine -> Solr
>> Are you building an analytics function -> Spark
>>
>> I feel they are used in significantly different use cases, what are
>> you trying to build?
>>
>> If it’s an analytics functionality that’s seperate from your
>> operations functionality I’d build it in it’s own DC. Spark is notorious
>> for causing latency spikes in Cassandra which is not great if you are are
>> sensitive to that.
>>
>> Regards,
>> Evelyn.
>>
>> On 12 Apr 2018, at 6:55 am, kooljava2 
>> wrote:
>>
>> Hello,
>>
>> We are exploring on configuring Sorl/Spark. Wanted to get input on
>> this.
>> 1) How do we decide which one to use?
>> 2) Do we run this on a DC where there is less workload?
>>
>> Any other suggestion or comments are appreciated.
>>
>> Thank you.
>>
>>
>>
> --
>> Ben Bromhead
>> CTO | Instaclustr 
>> +1 650 284 9692
>> Reliability at Scale
>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>>
>
>
>
> --
> Niclas Hedhman, Software Developer
> http://zest.apache.org - New Energy for Java
>
-- 
Ben Bromhead
CTO | Instaclustr 
+1 650 284 9692
Reliability at Scale
Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer