Re: help/suggestions to setup spark cluster
You can just cap the cores used per job. http://spark.apache.org/docs/latest/spark-standalone.html http://spark.apache.org/docs/latest/spark-standalone.html#resource-scheduling On Thu, Apr 27, 2017 at 1:07 AM, vincent gromakowskiwrote: > Spark standalone is not multi tenant you need one clusters per job. Maybe > you can try fair scheduling and use one cluster but i doubt it will be prod > ready... > > Le 27 avr. 2017 5:28 AM, "anna stax" a écrit : >> >> Thanks Cody, >> >> As I already mentioned I am running spark streaming on EC2 cluster in >> standalone mode. Now in addition to streaming, I want to be able to run >> spark batch job hourly and adhoc queries using Zeppelin. >> >> Can you please confirm that a standalone cluster is OK for this. Please >> provide me some links to help me get started. >> >> Thanks >> -Anna >> >> On Wed, Apr 26, 2017 at 7:46 PM, Cody Koeninger >> wrote: >>> >>> The standalone cluster manager is fine for production. Don't use Yarn >>> or Mesos unless you already have another need for it. >>> >>> On Wed, Apr 26, 2017 at 4:53 PM, anna stax wrote: >>> > Hi Sam, >>> > >>> > Thank you for the reply. >>> > >>> > What do you mean by >>> > I doubt people run spark in a. Single EC2 instance, certainly not in >>> > production I don't think >>> > >>> > What is wrong in having a data pipeline on EC2 that reads data from >>> > kafka, >>> > processes using spark and outputs to cassandra? Please explain. >>> > >>> > Thanks >>> > -Anna >>> > >>> > On Wed, Apr 26, 2017 at 2:22 PM, Sam Elamin >>> > wrote: >>> >> >>> >> Hi Anna >>> >> >>> >> There are a variety of options for launching spark clusters. I doubt >>> >> people run spark in a. Single EC2 instance, certainly not in >>> >> production I >>> >> don't think >>> >> >>> >> I don't have enough information of what you are trying to do but if >>> >> you >>> >> are just trying to set things up from scratch then I think you can >>> >> just use >>> >> EMR which will create a cluster for you and attach a zeppelin instance >>> >> as >>> >> well >>> >> >>> >> >>> >> You can also use databricks for ease of use and very little management >>> >> but >>> >> you will pay a premium for that abstraction >>> >> >>> >> >>> >> Regards >>> >> Sam >>> >> On Wed, 26 Apr 2017 at 22:02, anna stax wrote: >>> >>> >>> >>> I need to setup a spark cluster for Spark streaming and scheduled >>> >>> batch >>> >>> jobs and adhoc queries. >>> >>> Please give me some suggestions. Can this be done in standalone mode. >>> >>> >>> >>> Right now we have a spark cluster in standalone mode on AWS EC2 >>> >>> running >>> >>> spark streaming application. Can we run spark batch jobs and zeppelin >>> >>> on the >>> >>> same. Do we need a better resource manager like Mesos? >>> >>> >>> >>> Are there any companies or individuals that can help in setting this >>> >>> up? >>> >>> >>> >>> Thank you. >>> >>> >>> >>> -Anna >>> > >>> > >> >> > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: help/suggestions to setup spark cluster
Spark standalone is not multi tenant you need one clusters per job. Maybe you can try fair scheduling and use one cluster but i doubt it will be prod ready... Le 27 avr. 2017 5:28 AM, "anna stax"a écrit : > Thanks Cody, > > As I already mentioned I am running spark streaming on EC2 cluster in > standalone mode. Now in addition to streaming, I want to be able to run > spark batch job hourly and adhoc queries using Zeppelin. > > Can you please confirm that a standalone cluster is OK for this. Please > provide me some links to help me get started. > > Thanks > -Anna > > On Wed, Apr 26, 2017 at 7:46 PM, Cody Koeninger > wrote: > >> The standalone cluster manager is fine for production. Don't use Yarn >> or Mesos unless you already have another need for it. >> >> On Wed, Apr 26, 2017 at 4:53 PM, anna stax wrote: >> > Hi Sam, >> > >> > Thank you for the reply. >> > >> > What do you mean by >> > I doubt people run spark in a. Single EC2 instance, certainly not in >> > production I don't think >> > >> > What is wrong in having a data pipeline on EC2 that reads data from >> kafka, >> > processes using spark and outputs to cassandra? Please explain. >> > >> > Thanks >> > -Anna >> > >> > On Wed, Apr 26, 2017 at 2:22 PM, Sam Elamin >> wrote: >> >> >> >> Hi Anna >> >> >> >> There are a variety of options for launching spark clusters. I doubt >> >> people run spark in a. Single EC2 instance, certainly not in >> production I >> >> don't think >> >> >> >> I don't have enough information of what you are trying to do but if you >> >> are just trying to set things up from scratch then I think you can >> just use >> >> EMR which will create a cluster for you and attach a zeppelin instance >> as >> >> well >> >> >> >> >> >> You can also use databricks for ease of use and very little management >> but >> >> you will pay a premium for that abstraction >> >> >> >> >> >> Regards >> >> Sam >> >> On Wed, 26 Apr 2017 at 22:02, anna stax wrote: >> >>> >> >>> I need to setup a spark cluster for Spark streaming and scheduled >> batch >> >>> jobs and adhoc queries. >> >>> Please give me some suggestions. Can this be done in standalone mode. >> >>> >> >>> Right now we have a spark cluster in standalone mode on AWS EC2 >> running >> >>> spark streaming application. Can we run spark batch jobs and zeppelin >> on the >> >>> same. Do we need a better resource manager like Mesos? >> >>> >> >>> Are there any companies or individuals that can help in setting this >> up? >> >>> >> >>> Thank you. >> >>> >> >>> -Anna >> > >> > >> > >
Re: help/suggestions to setup spark cluster
Thanks Cody, As I already mentioned I am running spark streaming on EC2 cluster in standalone mode. Now in addition to streaming, I want to be able to run spark batch job hourly and adhoc queries using Zeppelin. Can you please confirm that a standalone cluster is OK for this. Please provide me some links to help me get started. Thanks -Anna On Wed, Apr 26, 2017 at 7:46 PM, Cody Koeningerwrote: > The standalone cluster manager is fine for production. Don't use Yarn > or Mesos unless you already have another need for it. > > On Wed, Apr 26, 2017 at 4:53 PM, anna stax wrote: > > Hi Sam, > > > > Thank you for the reply. > > > > What do you mean by > > I doubt people run spark in a. Single EC2 instance, certainly not in > > production I don't think > > > > What is wrong in having a data pipeline on EC2 that reads data from > kafka, > > processes using spark and outputs to cassandra? Please explain. > > > > Thanks > > -Anna > > > > On Wed, Apr 26, 2017 at 2:22 PM, Sam Elamin > wrote: > >> > >> Hi Anna > >> > >> There are a variety of options for launching spark clusters. I doubt > >> people run spark in a. Single EC2 instance, certainly not in production > I > >> don't think > >> > >> I don't have enough information of what you are trying to do but if you > >> are just trying to set things up from scratch then I think you can just > use > >> EMR which will create a cluster for you and attach a zeppelin instance > as > >> well > >> > >> > >> You can also use databricks for ease of use and very little management > but > >> you will pay a premium for that abstraction > >> > >> > >> Regards > >> Sam > >> On Wed, 26 Apr 2017 at 22:02, anna stax wrote: > >>> > >>> I need to setup a spark cluster for Spark streaming and scheduled batch > >>> jobs and adhoc queries. > >>> Please give me some suggestions. Can this be done in standalone mode. > >>> > >>> Right now we have a spark cluster in standalone mode on AWS EC2 running > >>> spark streaming application. Can we run spark batch jobs and zeppelin > on the > >>> same. Do we need a better resource manager like Mesos? > >>> > >>> Are there any companies or individuals that can help in setting this > up? > >>> > >>> Thank you. > >>> > >>> -Anna > > > > >
Re: help/suggestions to setup spark cluster
The standalone cluster manager is fine for production. Don't use Yarn or Mesos unless you already have another need for it. On Wed, Apr 26, 2017 at 4:53 PM, anna staxwrote: > Hi Sam, > > Thank you for the reply. > > What do you mean by > I doubt people run spark in a. Single EC2 instance, certainly not in > production I don't think > > What is wrong in having a data pipeline on EC2 that reads data from kafka, > processes using spark and outputs to cassandra? Please explain. > > Thanks > -Anna > > On Wed, Apr 26, 2017 at 2:22 PM, Sam Elamin wrote: >> >> Hi Anna >> >> There are a variety of options for launching spark clusters. I doubt >> people run spark in a. Single EC2 instance, certainly not in production I >> don't think >> >> I don't have enough information of what you are trying to do but if you >> are just trying to set things up from scratch then I think you can just use >> EMR which will create a cluster for you and attach a zeppelin instance as >> well >> >> >> You can also use databricks for ease of use and very little management but >> you will pay a premium for that abstraction >> >> >> Regards >> Sam >> On Wed, 26 Apr 2017 at 22:02, anna stax wrote: >>> >>> I need to setup a spark cluster for Spark streaming and scheduled batch >>> jobs and adhoc queries. >>> Please give me some suggestions. Can this be done in standalone mode. >>> >>> Right now we have a spark cluster in standalone mode on AWS EC2 running >>> spark streaming application. Can we run spark batch jobs and zeppelin on the >>> same. Do we need a better resource manager like Mesos? >>> >>> Are there any companies or individuals that can help in setting this up? >>> >>> Thank you. >>> >>> -Anna > > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: help/suggestions to setup spark cluster
Hi Sam, Thank you for the reply. What do you mean by I doubt people run spark in a. Single EC2 instance, certainly not in production I don't think What is wrong in having a data pipeline on EC2 that reads data from kafka, processes using spark and outputs to cassandra? Please explain. Thanks -Anna On Wed, Apr 26, 2017 at 2:22 PM, Sam Elaminwrote: > Hi Anna > > There are a variety of options for launching spark clusters. I doubt > people run spark in a. Single EC2 instance, certainly not in production I > don't think > > I don't have enough information of what you are trying to do but if you > are just trying to set things up from scratch then I think you can just use > EMR which will create a cluster for you and attach a zeppelin instance as > well > > > You can also use databricks for ease of use and very little management but > you will pay a premium for that abstraction > > > Regards > Sam > On Wed, 26 Apr 2017 at 22:02, anna stax wrote: > >> I need to setup a spark cluster for Spark streaming and scheduled batch >> jobs and adhoc queries. >> Please give me some suggestions. Can this be done in standalone mode. >> >> Right now we have a spark cluster in standalone mode on AWS EC2 running >> spark streaming application. Can we run spark batch jobs and zeppelin on >> the same. Do we need a better resource manager like Mesos? >> >> Are there any companies or individuals that can help in setting this up? >> >> Thank you. >> >> -Anna >> >
Re: help/suggestions to setup spark cluster
Hi Anna There are a variety of options for launching spark clusters. I doubt people run spark in a. Single EC2 instance, certainly not in production I don't think I don't have enough information of what you are trying to do but if you are just trying to set things up from scratch then I think you can just use EMR which will create a cluster for you and attach a zeppelin instance as well You can also use databricks for ease of use and very little management but you will pay a premium for that abstraction Regards Sam On Wed, 26 Apr 2017 at 22:02, anna staxwrote: > I need to setup a spark cluster for Spark streaming and scheduled batch > jobs and adhoc queries. > Please give me some suggestions. Can this be done in standalone mode. > > Right now we have a spark cluster in standalone mode on AWS EC2 running > spark streaming application. Can we run spark batch jobs and zeppelin on > the same. Do we need a better resource manager like Mesos? > > Are there any companies or individuals that can help in setting this up? > > Thank you. > > -Anna >
help/suggestions to setup spark cluster
I need to setup a spark cluster for Spark streaming and scheduled batch jobs and adhoc queries. Please give me some suggestions. Can this be done in standalone mode. Right now we have a spark cluster in standalone mode on AWS EC2 running spark streaming application. Can we run spark batch jobs and zeppelin on the same. Do we need a better resource manager like Mesos? Are there any companies or individuals that can help in setting this up? Thank you. -Anna