Hi Prashant, I guess you are referring to the local-cluster mode? AFAIK the local-cluster mode has not been mentioned at all in the user guide, thus it should only be used in Spark tests. Also, there are a few differences between having multiple workers on the same node and having one worker on each node, as I mentioned in https://issues.apache.org/jira/browse/SPARK-27371 , a complex approach is needed to resolve the resource requirement contentions between different workers running on the same node.
Cheers, Xingbo On Thu, Mar 5, 2020 at 8:49 PM Prashant Sharma <scrapco...@gmail.com> wrote: > It was by design, one could run multiple workers on his laptop for trying > out or testing spark in distributed mode, one could launch multiple workers > and see how resource offers and requirements work. Certainly, I have not > commonly seen, starting multiple workers on the same node as a practice so > far. > > Why do we consider it as a special case for scheduling, where two workers > are on the same node than two different nodes? Possibly, optimize on > network I/o and disk I/O? > > On Tue, Mar 3, 2020 at 12:45 AM Xingbo Jiang <jiangxb1...@gmail.com> > wrote: > >> Thanks Sean for your input, I really think it could simplify Spark >> Standalone backend a lot by only allowing a single worker on the same host, >> also I can confirm this deploy model can satisfy all the workloads deployed >> on Standalone backend AFAIK. >> >> Regarding the case multiple distinct Spark clusters running a worker on >> one machine, I'm not sure whether that's something we have claimed to >> support, could someone with more context on this scenario share their use >> case? >> >> Cheers, >> >> Xingbo >> >> On Fri, Feb 28, 2020 at 11:29 AM Sean Owen <sro...@gmail.com> wrote: >> >>> I'll admit, I didn't know you could deploy multiple workers per >>> machine. I agree, I don't see the use case for it? multiple executors, >>> yes of course. And I guess you could imagine multiple distinct Spark >>> clusters running a worker on one machine. I don't have an informed >>> opinion therefore, but agree that it seems like a best practice enough >>> to enforce 1 worker per machine, if it makes things simpler rather >>> than harder. >>> >>> On Fri, Feb 28, 2020 at 1:21 PM Xingbo Jiang <jiangxb1...@gmail.com> >>> wrote: >>> > >>> > Hi all, >>> > >>> > Based on my experience, there is no scenario that necessarily requires >>> deploying multiple Workers on the same node with Standalone backend. A >>> worker should book all the resources reserved to Spark on the host it is >>> launched, then it can allocate those resources to one or more executors >>> launched by this worker. Since each executor runs in a separated JVM, we >>> can limit the memory of each executor to avoid long GC pause. >>> > >>> > The remaining concern is the local-cluster mode is implemented by >>> launching multiple workers on the local host, we might need to re-implement >>> LocalSparkCluster to launch only one Worker and multiple executors. It >>> should be fine because local-cluster mode is only used in running Spark >>> unit test cases, thus end users should not be affected by this change. >>> > >>> > Removing multiple workers on the same host support could simplify the >>> deploy model of Standalone backend, and also reduce the burden to support >>> legacy deploy pattern in the future feature developments. (There is an >>> example in https://issues.apache.org/jira/browse/SPARK-27371 , where we >>> designed a complex approach to coordinate resource requirements from >>> different workers launched on the same host). >>> > >>> > The proposal is to update the document to deprecate the support of >>> system environment `SPARK_WORKER_INSTANCES` in Spark 3.0, and remove the >>> support in the next major version (Spark 3.1). >>> > >>> > Please kindly let me know if you have use cases relying on this >>> feature. >>> > >>> > Thanks! >>> > >>> > Xingbo >>> >>