Cassandra tends to be used in a lot of web applications. It’s loads are more 
natural and evenly distributed. Like people logging on throughout the day. And 
people operating it tend to be latency sensitive.

Spark on the other hand will try and complete it’s tasks as quickly as 
possible. This might mean bulk reading from the Cassandra at 10 times the usual 
operations load, but for only say 5 minutes every half hour (however long it 
takes to read in the data for a job and whenever that job is run). In this case 
during that 5 minutes your normal operations work (customers) are going to 
experience a lot of latency.

This even happens with streaming jobs, every time spark goes to interact with 
Cassandra it does so very quickly, hammers it for reads and then does it’s own 
stuff until it needs to write things out. This might equate to intermittent 
latency spikes.

In theory, you can throttle your reads and writes but I don’t know much about 
this and don’t see people actually doing it.

Regards,
Evelyn.

> On 12 Apr 2018, at 4:30 pm, sha p <shatestt...@gmail.com> wrote:
> 
> Evelyn,
> Can you please elaborate on below
> Spark is notorious for causing latency spikes in Cassandra which is not great 
> if you are are sensitive to that. 
> 
> 
> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5015...@gmail.com 
> <mailto:u5015...@gmail.com>> wrote:
> Are you building a search engine -> Solr
> Are you building an analytics function -> Spark
> 
> I feel they are used in significantly different use cases, what are you 
> trying to build?
> 
> If it’s an analytics functionality that’s seperate from your operations 
> functionality I’d build it in it’s own DC. Spark is notorious for causing 
> latency spikes in Cassandra which is not great if you are are sensitive to 
> that. 
> 
> Regards,
> Evelyn.
>> On 12 Apr 2018, at 6:55 am, kooljava2 <koolja...@yahoo.com.INVALID 
>> <mailto:koolja...@yahoo.com.INVALID>> wrote:
>> 
>> Hello,
>> 
>> We are exploring on configuring Sorl/Spark. Wanted to get input on this.
>> 1) How do we decide which one to use?
>> 2) Do we run this on a DC where there is less workload?
>> 
>> Any other suggestion or comments are appreciated.
>> 
>> Thank you.
>> 
> 

Reply via email to