Cassandra tends to be used in a lot of web applications. It’s loads are more natural and evenly distributed. Like people logging on throughout the day. And people operating it tend to be latency sensitive.
Spark on the other hand will try and complete it’s tasks as quickly as possible. This might mean bulk reading from the Cassandra at 10 times the usual operations load, but for only say 5 minutes every half hour (however long it takes to read in the data for a job and whenever that job is run). In this case during that 5 minutes your normal operations work (customers) are going to experience a lot of latency. This even happens with streaming jobs, every time spark goes to interact with Cassandra it does so very quickly, hammers it for reads and then does it’s own stuff until it needs to write things out. This might equate to intermittent latency spikes. In theory, you can throttle your reads and writes but I don’t know much about this and don’t see people actually doing it. Regards, Evelyn. > On 12 Apr 2018, at 4:30 pm, sha p <shatestt...@gmail.com> wrote: > > Evelyn, > Can you please elaborate on below > Spark is notorious for causing latency spikes in Cassandra which is not great > if you are are sensitive to that. > > > On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5015...@gmail.com > <mailto:u5015...@gmail.com>> wrote: > Are you building a search engine -> Solr > Are you building an analytics function -> Spark > > I feel they are used in significantly different use cases, what are you > trying to build? > > If it’s an analytics functionality that’s seperate from your operations > functionality I’d build it in it’s own DC. Spark is notorious for causing > latency spikes in Cassandra which is not great if you are are sensitive to > that. > > Regards, > Evelyn. >> On 12 Apr 2018, at 6:55 am, kooljava2 <koolja...@yahoo.com.INVALID >> <mailto:koolja...@yahoo.com.INVALID>> wrote: >> >> Hello, >> >> We are exploring on configuring Sorl/Spark. Wanted to get input on this. >> 1) How do we decide which one to use? >> 2) Do we run this on a DC where there is less workload? >> >> Any other suggestion or comments are appreciated. >> >> Thank you. >> >