RE: Data platform support

2016-05-17 Thread Ashic Mahtab
If Spark workers are installed on the same nodes as Cassandra nodes, then they 
can take advantage of data locality, greatly reducing the amount of network IO 
in Spark jobs. If you use a seperate / Cloudera / Hortonworks / EMR cluster, 
you won't be able to benefit from this. Other than the locality issue, you can 
run Spark jobs from external clusters just fine. I've used both approaches, and 
for particular types of jobs, I've found a "custom" cluster with Spark 
Master(s) + n*[Spark Worker + Cassandra] to be very effective. 
-Ashic.

Date: Tue, 10 May 2016 17:13:25 +0100
Subject: Re: Data platform support
From: ksrinivas...@gmail.com
To: user@cassandra.apache.org

I understand that spark supports hdfs and standalone modes.The recommendation 
from cassandra is that spark should be installed in standalone mode in SMACK 
framework.
On 10 May 2016 at 16:24, Sruti S <sruti.shivaku...@gmail.com> wrote:
Not sure what is meant.. Spark can access HDFS. Why is it in standalone mode? 
Please clarify.
On Tue, May 10, 2016 at 11:08 AM, Srini Sydney <ksrinivas...@gmail.com> wrote:
I have a clarification based on your answer -
spark is installed as standalone mode (not hdfs) in SMACK framework. Our data 
lake is in hdfs . How do we overcome this ?

  - cheers sreeni

On 10 May 2016, at 08:16, vincent gromakowski <vincent.gromakow...@gmail.com> 
wrote:

Maybe a SMACK stack would be a better option for using spark with Cassandra...
Le 10 mai 2016 8:45 AM, "Srini Sydney" <ksrinivas...@gmail.com> a écrit :
Thanks a lot..denise
On 10 May 2016 at 02:42, Denise Rogers <datag...@aol.com> wrote:
It really depends how close you want to stay to the most current versions of 
open source community products.



Cloudera has tended to build more products that requires their distribution to 
not be as current with open source product versions.



Regards,

Denise



Sent from mi iPhone



> On May 9, 2016, at 8:21 PM, Srini Sydney <ksrinivas...@gmail.com> wrote:

>

> Hi guys

>

> We are thinking of using one the 3 big data platforms i.e hortonworks , mapr 
> or cloudera . Will use hadoop ,hive , zookeeper, and spark in these platforms.

>

>

> Which platform would be better suited for cassandra ?

>

>

> -  sreeni

>









  

Re: Data platform support

2016-05-10 Thread Srini Sydney
I understand that spark supports hdfs and standalone modes.
The recommendation from cassandra is that spark should be installed in
standalone mode in SMACK framework.

On 10 May 2016 at 16:24, Sruti S  wrote:

> Not sure what is meant.. Spark can access HDFS. Why is it in standalone
> mode? Please clarify.
>
> On Tue, May 10, 2016 at 11:08 AM, Srini Sydney 
> wrote:
>
>> I have a clarification based on your answer -
>>
>> spark is installed as standalone mode (not hdfs) in SMACK framework. Our
>> data lake is in hdfs . How do we overcome this ?
>>
>>
>>  - cheers sreeni
>>
>>
>> On 10 May 2016, at 08:16, vincent gromakowski <
>> vincent.gromakow...@gmail.com> wrote:
>>
>> Maybe a SMACK stack would be a better option for using spark with
>> Cassandra...
>> Le 10 mai 2016 8:45 AM, "Srini Sydney"  a écrit :
>>
>>> Thanks a lot..denise
>>>
>>> On 10 May 2016 at 02:42, Denise Rogers  wrote:
>>>
 It really depends how close you want to stay to the most current
 versions of open source community products.

 Cloudera has tended to build more products that requires their
 distribution to not be as current with open source product versions.

 Regards,
 Denise

 Sent from mi iPhone

 > On May 9, 2016, at 8:21 PM, Srini Sydney 
 wrote:
 >
 > Hi guys
 >
 > We are thinking of using one the 3 big data platforms i.e hortonworks
 , mapr or cloudera . Will use hadoop ,hive , zookeeper, and spark in these
 platforms.
 >
 >
 > Which platform would be better suited for cassandra ?
 >
 >
 > -  sreeni
 >


>>>
>


Re: Data platform support

2016-05-10 Thread Sruti S
Not sure what is meant.. Spark can access HDFS. Why is it in standalone
mode? Please clarify.

On Tue, May 10, 2016 at 11:08 AM, Srini Sydney 
wrote:

> I have a clarification based on your answer -
>
> spark is installed as standalone mode (not hdfs) in SMACK framework. Our
> data lake is in hdfs . How do we overcome this ?
>
>
>  - cheers sreeni
>
>
> On 10 May 2016, at 08:16, vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
> Maybe a SMACK stack would be a better option for using spark with
> Cassandra...
> Le 10 mai 2016 8:45 AM, "Srini Sydney"  a écrit :
>
>> Thanks a lot..denise
>>
>> On 10 May 2016 at 02:42, Denise Rogers  wrote:
>>
>>> It really depends how close you want to stay to the most current
>>> versions of open source community products.
>>>
>>> Cloudera has tended to build more products that requires their
>>> distribution to not be as current with open source product versions.
>>>
>>> Regards,
>>> Denise
>>>
>>> Sent from mi iPhone
>>>
>>> > On May 9, 2016, at 8:21 PM, Srini Sydney 
>>> wrote:
>>> >
>>> > Hi guys
>>> >
>>> > We are thinking of using one the 3 big data platforms i.e hortonworks
>>> , mapr or cloudera . Will use hadoop ,hive , zookeeper, and spark in these
>>> platforms.
>>> >
>>> >
>>> > Which platform would be better suited for cassandra ?
>>> >
>>> >
>>> > -  sreeni
>>> >
>>>
>>>
>>


Re: Data platform support

2016-05-10 Thread Srini Sydney
I have a clarification based on your answer -

spark is installed as standalone mode (not hdfs) in SMACK framework. Our data 
lake is in hdfs . How do we overcome this ?

 
 - cheers sreeni


> On 10 May 2016, at 08:16, vincent gromakowski  
> wrote:
> 
> Maybe a SMACK stack would be a better option for using spark with Cassandra...
> 
> Le 10 mai 2016 8:45 AM, "Srini Sydney"  a écrit :
>> Thanks a lot..denise
>> 
>> On 10 May 2016 at 02:42, Denise Rogers  wrote:
>>> It really depends how close you want to stay to the most current versions 
>>> of open source community products.
>>> 
>>> Cloudera has tended to build more products that requires their distribution 
>>> to not be as current with open source product versions.
>>> 
>>> Regards,
>>> Denise
>>> 
>>> Sent from mi iPhone
>>> 
>>> > On May 9, 2016, at 8:21 PM, Srini Sydney  wrote:
>>> >
>>> > Hi guys
>>> >
>>> > We are thinking of using one the 3 big data platforms i.e hortonworks , 
>>> > mapr or cloudera . Will use hadoop ,hive , zookeeper, and spark in these 
>>> > platforms.
>>> >
>>> >
>>> > Which platform would be better suited for cassandra ?
>>> >
>>> >
>>> > -  sreeni
>>> >


Re: Data platform support

2016-05-10 Thread vincent gromakowski
Maybe a SMACK stack would be a better option for using spark with
Cassandra...
Le 10 mai 2016 8:45 AM, "Srini Sydney"  a écrit :

> Thanks a lot..denise
>
> On 10 May 2016 at 02:42, Denise Rogers  wrote:
>
>> It really depends how close you want to stay to the most current versions
>> of open source community products.
>>
>> Cloudera has tended to build more products that requires their
>> distribution to not be as current with open source product versions.
>>
>> Regards,
>> Denise
>>
>> Sent from mi iPhone
>>
>> > On May 9, 2016, at 8:21 PM, Srini Sydney 
>> wrote:
>> >
>> > Hi guys
>> >
>> > We are thinking of using one the 3 big data platforms i.e hortonworks ,
>> mapr or cloudera . Will use hadoop ,hive , zookeeper, and spark in these
>> platforms.
>> >
>> >
>> > Which platform would be better suited for cassandra ?
>> >
>> >
>> > -  sreeni
>> >
>>
>>
>


Re: Data platform support

2016-05-10 Thread Srini Sydney
Thanks a lot..denise

On 10 May 2016 at 02:42, Denise Rogers  wrote:

> It really depends how close you want to stay to the most current versions
> of open source community products.
>
> Cloudera has tended to build more products that requires their
> distribution to not be as current with open source product versions.
>
> Regards,
> Denise
>
> Sent from mi iPhone
>
> > On May 9, 2016, at 8:21 PM, Srini Sydney  wrote:
> >
> > Hi guys
> >
> > We are thinking of using one the 3 big data platforms i.e hortonworks ,
> mapr or cloudera . Will use hadoop ,hive , zookeeper, and spark in these
> platforms.
> >
> >
> > Which platform would be better suited for cassandra ?
> >
> >
> > -  sreeni
> >
>
>


Re: Data platform support

2016-05-09 Thread Denise Rogers
It really depends how close you want to stay to the most current versions of 
open source community products. 

Cloudera has tended to build more products that requires their distribution to 
not be as current with open source product versions. 

Regards,
Denise

Sent from mi iPhone

> On May 9, 2016, at 8:21 PM, Srini Sydney  wrote:
> 
> Hi guys
> 
> We are thinking of using one the 3 big data platforms i.e hortonworks , mapr 
> or cloudera . Will use hadoop ,hive , zookeeper, and spark in these platforms.
> 
> 
> Which platform would be better suited for cassandra ?
> 
> 
> -  sreeni
> 



Data platform support

2016-05-09 Thread Srini Sydney
Hi guys

We are thinking of using one the 3 big data platforms i.e hortonworks , mapr or 
cloudera . Will use hadoop ,hive , zookeeper, and spark in these platforms.


Which platform would be better suited for cassandra ?

 
 -  sreeni