Re: Sorl/DSE Spark

Niclas Hedhman Thu, 12 Apr 2018 18:24:07 -0700

Ben,

1. I don't see anything in this thread that is DSE specific, so I think it
belongs here.


2. Careful when you say that Datastax produces Cassandra. Cassandra is a
product of Apache Software Foundation, and no one else. You, Ben, should be
very well aware of this, to avoid further trademark issues between Datastax
and ASF.

Cheers
Niclas Hedhman
Member of ASF

On Thu, Apr 12, 2018 at 9:57 PM, Ben Bromhead <b...@instaclustr.com> wrote:

> Folks this is the user list for Apache Cassandra. I would suggest
> redirecting the question to Datastax the commercial entity that produces
> the software.
>
> On Thu, Apr 12, 2018 at 9:51 AM vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
>> Best practise is to use a dedicated DC for analytics separated from the
>> hot DC.
>>
>> Le jeu. 12 avr. 2018 à 15:45, sha p <shatestt...@gmail.com> a écrit :
>>
>>> Got it.
>>> Thank you so for your detailed explanation.
>>>
>>> Regards,
>>> Shyam
>>>
>>> On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5015...@gmail.com> wrote:
>>>
>>>> Cassandra tends to be used in a lot of web applications. It’s loads are
>>>> more natural and evenly distributed. Like people logging on throughout the
>>>> day. And people operating it tend to be latency sensitive.
>>>>
>>>> Spark on the other hand will try and complete it’s tasks as quickly as
>>>> possible. This might mean bulk reading from the Cassandra at 10 times the
>>>> usual operations load, but for only say 5 minutes every half hour (however
>>>> long it takes to read in the data for a job and whenever that job is run).
>>>> In this case during that 5 minutes your normal operations work (customers)
>>>> are going to experience a lot of latency.
>>>>
>>>> This even happens with streaming jobs, every time spark goes to
>>>> interact with Cassandra it does so very quickly, hammers it for reads and
>>>> then does it’s own stuff until it needs to write things out. This might
>>>> equate to intermittent latency spikes.
>>>>
>>>> In theory, you can throttle your reads and writes but I don’t know much
>>>> about this and don’t see people actually doing it.
>>>>
>>>> Regards,
>>>> Evelyn.
>>>>
>>>> On 12 Apr 2018, at 4:30 pm, sha p <shatestt...@gmail.com> wrote:
>>>>
>>>> Evelyn,
>>>> Can you please elaborate on below
>>>> Spark is notorious for causing latency spikes in Cassandra which is not
>>>> great if you are are sensitive to that.
>>>>
>>>>
>>>> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5015...@gmail.com> wrote:
>>>>
>>>>> Are you building a search engine -> Solr
>>>>> Are you building an analytics function -> Spark
>>>>>
>>>>> I feel they are used in significantly different use cases, what are
>>>>> you trying to build?
>>>>>
>>>>> If it’s an analytics functionality that’s seperate from your
>>>>> operations functionality I’d build it in it’s own DC. Spark is notorious
>>>>> for causing latency spikes in Cassandra which is not great if you are are
>>>>> sensitive to that.
>>>>>
>>>>> Regards,
>>>>> Evelyn.
>>>>>
>>>>> On 12 Apr 2018, at 6:55 am, kooljava2 <koolja...@yahoo.com.INVALID>
>>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> We are exploring on configuring Sorl/Spark. Wanted to get input on
>>>>> this.
>>>>> 1) How do we decide which one to use?
>>>>> 2) Do we run this on a DC where there is less workload?
>>>>>
>>>>> Any other suggestion or comments are appreciated.
>>>>>
>>>>> Thank you.
>>>>>
>>>>>
>>>>>
>>>> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Reliability at Scale
> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>



-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java

Re: Sorl/DSE Spark

Reply via email to