Flink also has a standalone mode. On Thu, 28 Jul 2016 at 13:42 Ismaël Mejía <[email protected]> wrote:
> Good subject, YARN is the de-facto standard at least from the point of > view of the Big Data Distributions (Cloudera, Hortonworks, etc) and Cloud > offers, e.g. AWS EMR, Azure HDInsight and Google Dataproc), and given that > it is supported by both Spark and Flink I think it is valuable to test the > support for YARN. The question is, should the tests be run on 'Standalone' > OR YARN' or maybe we can have tests for 'Standalone AND YARN' ? > > Ismael. > > > > > On Thu, Jul 28, 2016 at 12:24 PM, Amit Sela <[email protected]> wrote: > > > Following a discussion I had with Kenneth and Dan here > > <https://github.com/apache/incubator-beam/pull/711>. I want to raise the > > issue of which resource manager we should use for on going tests that > will > > run on actual clusters (on top of local/in-mem tests). > > If we plan to test all runners on all their supported resource managers, > > great! But I guess this won't be the case, at least not at the beginning. > > > > Spark can run it's own (Standalone Mode) resource manager, use YARN or > use > > Mesos. According to the latest survey > > < > > > http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf > > > > > by > > Databricks Standalone is in the lead (48%), with YARN tailing it > > (40%) while Mesos looks like the least favourite. > > For Spark, I'd vote for Standalone as it is the most popular use case + > it > > avoids the additional complexity of maintaining YARN on this cluster. > > Having said that, AFAIK Flink is a "first-class" YARN citizen (right ?) > and > > I don't know what available resource managers can be used by other > runners, > > so I think runner authors should give their input here. > > > > *Summary:* > > *Spark* - StandaloneMode or YARN (in that order). > > *Flink * - ? > > *Others* - ? > > > > Thanks, > > Amit > > >
