Re: [DISCUSS] cluster infrastructure - resource manager - for on going tests

Aljoscha Krettek Thu, 28 Jul 2016 09:48:40 -0700

For Flink, Yarn is fine and I guess it's the common denominator for all
runners (except DataflowRunner, of course).


@Kenn IMHO the common deployment is Kafka (running standalone, because it
only works that way), which also requires Zookeeper (if I'm not mistaken)
and YARN, which all runners should be able to run on.

On Thu, 28 Jul 2016 at 18:36 Kenneth Knowles <[email protected]> wrote:

> Presumably we'll eventually also run additional services alongside (like
> Kafka) to have true integration tests for I/O connectors. What is the
> common deployment in this case?
>
> On Jul 28, 2016 06:35, "Amit Sela" <[email protected]> wrote:
>
> > So what would be the preferred resource manager to test Flink on ?
> >
> > On Thu, Jul 28, 2016, 16:34 Aljoscha Krettek <[email protected]>
> wrote:
> >
> > > Flink also has a standalone mode.
> > >
> > > On Thu, 28 Jul 2016 at 13:42 Ismaël Mejía <[email protected]> wrote:
> > >
> > > > Good subject,  YARN is the de-facto standard at least from the point
> of
> > > > view of the Big Data Distributions (Cloudera, Hortonworks, etc) and
> > Cloud
> > > > offers, e.g. AWS EMR, Azure HDInsight and Google Dataproc), and given
> > > that
> > > > it is supported by both Spark and Flink I think it is valuable to
> test
> > > the
> > > > support for YARN. The question is, should the tests be run on
> > > 'Standalone'
> > > > OR YARN' or maybe we can have  tests for 'Standalone AND YARN' ?
> > > >
> > > > Ismael.
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Jul 28, 2016 at 12:24 PM, Amit Sela <[email protected]>
> > > wrote:
> > > >
> > > > > Following a discussion I had with Kenneth and Dan here
> > > > > <https://github.com/apache/incubator-beam/pull/711>. I want to
> raise
> > > the
> > > > > issue of which resource manager we should use for on going tests
> that
> > > > will
> > > > > run on actual clusters (on top of local/in-mem tests).
> > > > > If we plan to test all runners on all their supported resource
> > > managers,
> > > > > great! But I guess this won't be the case, at least not at the
> > > beginning.
> > > > >
> > > > > Spark can run it's own (Standalone Mode) resource manager, use YARN
> > or
> > > > use
> > > > > Mesos. According to the latest survey
> > > > > <
> > > > >
> > > >
> > >
> >
> http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
> > > > > >
> > > > > by
> > > > > Databricks Standalone is in the lead (48%), with YARN tailing it
> > > > > (40%) while Mesos looks like the least favourite.
> > > > > For Spark, I'd vote for Standalone as it is the most popular use
> > case +
> > > > it
> > > > > avoids the additional complexity of maintaining YARN on this
> cluster.
> > > > > Having said that, AFAIK Flink is a "first-class" YARN citizen
> (right
> > ?)
> > > > and
> > > > > I don't know what available resource managers can be used by other
> > > > runners,
> > > > > so I think runner authors should give their input here.
> > > > >
> > > > > *Summary:*
> > > > > *Spark* - StandaloneMode or YARN (in that order).
> > > > > *Flink * - ?
> > > > > *Others* - ?
> > > > >
> > > > > Thanks,
> > > > > Amit
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] cluster infrastructure - resource manager - for on going tests

Reply via email to