If Spark workers are installed on the same nodes as Cassandra nodes, then they can take advantage of data locality, greatly reducing the amount of network IO in Spark jobs. If you use a seperate / Cloudera / Hortonworks / EMR cluster, you won't be able to benefit from this. Other than the locality issue, you can run Spark jobs from external clusters just fine. I've used both approaches, and for particular types of jobs, I've found a "custom" cluster with Spark Master(s) + n*[Spark Worker + Cassandra] to be very effective. -Ashic.
Date: Tue, 10 May 2016 17:13:25 +0100 Subject: Re: Data platform support From: ksrinivas...@gmail.com To: user@cassandra.apache.org I understand that spark supports hdfs and standalone modes.The recommendation from cassandra is that spark should be installed in standalone mode in SMACK framework. On 10 May 2016 at 16:24, Sruti S <sruti.shivaku...@gmail.com> wrote: Not sure what is meant.. Spark can access HDFS. Why is it in standalone mode? Please clarify. On Tue, May 10, 2016 at 11:08 AM, Srini Sydney <ksrinivas...@gmail.com> wrote: I have a clarification based on your answer - spark is installed as standalone mode (not hdfs) in SMACK framework. Our data lake is in hdfs . How do we overcome this ? - cheers sreeni On 10 May 2016, at 08:16, vincent gromakowski <vincent.gromakow...@gmail.com> wrote: Maybe a SMACK stack would be a better option for using spark with Cassandra... Le 10 mai 2016 8:45 AM, "Srini Sydney" <ksrinivas...@gmail.com> a écrit : Thanks a lot..denise On 10 May 2016 at 02:42, Denise Rogers <datag...@aol.com> wrote: It really depends how close you want to stay to the most current versions of open source community products. Cloudera has tended to build more products that requires their distribution to not be as current with open source product versions. Regards, Denise Sent from mi iPhone > On May 9, 2016, at 8:21 PM, Srini Sydney <ksrinivas...@gmail.com> wrote: > > Hi guys > > We are thinking of using one the 3 big data platforms i.e hortonworks , mapr > or cloudera . Will use hadoop ,hive , zookeeper, and spark in these platforms. > > > Which platform would be better suited for cassandra ? > > > - sreeni >