Good subject,  YARN is the de-facto standard at least from the point of
view of the Big Data Distributions (Cloudera, Hortonworks, etc) and Cloud
offers, e.g. AWS EMR, Azure HDInsight and Google Dataproc), and given that
it is supported by both Spark and Flink I think it is valuable to test the
support for YARN. The question is, should the tests be run on 'Standalone'
OR YARN' or maybe we can have  tests for 'Standalone AND YARN' ?

Ismael.




On Thu, Jul 28, 2016 at 12:24 PM, Amit Sela <[email protected]> wrote:

> Following a discussion I had with Kenneth and Dan here
> <https://github.com/apache/incubator-beam/pull/711>. I want to raise the
> issue of which resource manager we should use for on going tests that will
> run on actual clusters (on top of local/in-mem tests).
> If we plan to test all runners on all their supported resource managers,
> great! But I guess this won't be the case, at least not at the beginning.
>
> Spark can run it's own (Standalone Mode) resource manager, use YARN or use
> Mesos. According to the latest survey
> <
> http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf
> >
> by
> Databricks Standalone is in the lead (48%), with YARN tailing it
> (40%) while Mesos looks like the least favourite.
> For Spark, I'd vote for Standalone as it is the most popular use case + it
> avoids the additional complexity of maintaining YARN on this cluster.
> Having said that, AFAIK Flink is a "first-class" YARN citizen (right ?) and
> I don't know what available resource managers can be used by other runners,
> so I think runner authors should give their input here.
>
> *Summary:*
> *Spark* - StandaloneMode or YARN (in that order).
> *Flink * - ?
> *Others* - ?
>
> Thanks,
> Amit
>

Reply via email to