If Spark workers are installed on the same nodes as Cassandra nodes, then they 
can take advantage of data locality, greatly reducing the amount of network IO 
in Spark jobs. If you use a seperate / Cloudera / Hortonworks / EMR cluster, 
you won't be able to benefit from this. Other than the locality issue, you can 
run Spark jobs from external clusters just fine. I've used both approaches, and 
for particular types of jobs, I've found a "custom" cluster with Spark 
Master(s) + n*[Spark Worker + Cassandra] to be very effective. 
-Ashic.

Date: Tue, 10 May 2016 17:13:25 +0100
Subject: Re: Data platform support
From: ksrinivas...@gmail.com
To: user@cassandra.apache.org

I understand that spark supports hdfs and standalone modes.The recommendation 
from cassandra is that spark should be installed in standalone mode in SMACK 
framework.
On 10 May 2016 at 16:24, Sruti S <sruti.shivaku...@gmail.com> wrote:
Not sure what is meant.. Spark can access HDFS. Why is it in standalone mode? 
Please clarify.
On Tue, May 10, 2016 at 11:08 AM, Srini Sydney <ksrinivas...@gmail.com> wrote:
I have a clarification based on your answer -
spark is installed as standalone mode (not hdfs) in SMACK framework. Our data 
lake is in hdfs . How do we overcome this ?

  - cheers sreeni

On 10 May 2016, at 08:16, vincent gromakowski <vincent.gromakow...@gmail.com> 
wrote:

Maybe a SMACK stack would be a better option for using spark with Cassandra...
Le 10 mai 2016 8:45 AM, "Srini Sydney" <ksrinivas...@gmail.com> a écrit :
Thanks a lot..denise
On 10 May 2016 at 02:42, Denise Rogers <datag...@aol.com> wrote:
It really depends how close you want to stay to the most current versions of 
open source community products.



Cloudera has tended to build more products that requires their distribution to 
not be as current with open source product versions.



Regards,

Denise



Sent from mi iPhone



> On May 9, 2016, at 8:21 PM, Srini Sydney <ksrinivas...@gmail.com> wrote:

>

> Hi guys

>

> We are thinking of using one the 3 big data platforms i.e hortonworks , mapr 
> or cloudera . Will use hadoop ,hive , zookeeper, and spark in these platforms.

>

>

> Which platform would be better suited for cassandra ?

>

>

> -  sreeni

>









                                          

Reply via email to