Following a discussion I had with Kenneth and Dan here <https://github.com/apache/incubator-beam/pull/711>. I want to raise the issue of which resource manager we should use for on going tests that will run on actual clusters (on top of local/in-mem tests). If we plan to test all runners on all their supported resource managers, great! But I guess this won't be the case, at least not at the beginning.
Spark can run it's own (Standalone Mode) resource manager, use YARN or use Mesos. According to the latest survey <http://go.databricks.com/hubfs/DataBricks_Surveys_-_Content/Spark-Survey-2015-Infographic.pdf> by Databricks Standalone is in the lead (48%), with YARN tailing it (40%) while Mesos looks like the least favourite. For Spark, I'd vote for Standalone as it is the most popular use case + it avoids the additional complexity of maintaining YARN on this cluster. Having said that, AFAIK Flink is a "first-class" YARN citizen (right ?) and I don't know what available resource managers can be used by other runners, so I think runner authors should give their input here. *Summary:* *Spark* - StandaloneMode or YARN (in that order). *Flink * - ? *Others* - ? Thanks, Amit
