If I add additional small box to the cluster can I configure yarn to select small box to run am container?
On Mon, Feb 8, 2016 at 10:53 PM, Sean Owen <so...@cloudera.com> wrote: > Typically YARN is there because you're mediating resource requests > from things besides Spark, so yeah using every bit of the cluster is a > little bit of a corner case. There's not a good answer if all your > nodes are the same size. > > I think you can let YARN over-commit RAM though, and allocate more > memory than it actually has. It may be beneficial to let them all > think they have an extra GB, and let one node running the AM > technically be overcommitted, a state which won't hurt at all unless > you're really really tight on memory, in which case something might > get killed. > > On Tue, Feb 9, 2016 at 6:49 AM, Jonathan Kelly <jonathaka...@gmail.com> > wrote: > > Alex, > > > > That's a very good question that I've been trying to answer myself > recently > > too. Since you've mentioned before that you're using EMR, I assume you're > > asking this because you've noticed this behavior on emr-4.3.0. > > > > In this release, we made some changes to the maximizeResourceAllocation > > (which you may or may not be using, but either way this issue is > present), > > including the accidental inclusion of somewhat of a bug that makes it not > > reserve any space for the AM, which ultimately results in one of the > nodes > > being utilized only by the AM and not an executor. > > > > However, as you point out, the only viable fix seems to be to reserve > enough > > memory for the AM on *every single node*, which in some cases might > actually > > be worse than wasting a lot of memory on a single node. > > > > So yeah, I also don't like either option. Is this just the price you pay > for > > running on YARN? > > > > > > ~ Jonathan > > > > On Mon, Feb 8, 2016 at 9:03 PM Alexander Pivovarov <apivova...@gmail.com > > > > wrote: > >> > >> Lets say that yarn has 53GB memory available on each slave > >> > >> spark.am container needs 896MB. (512 + 384) > >> > >> I see two options to configure spark: > >> > >> 1. configure spark executors to use 52GB and leave 1 GB on each box. So, > >> some box will also run am container. So, 1GB memory will not be used on > all > >> slaves but one. > >> > >> 2. configure spark to use all 53GB and add additional 53GB box which > will > >> run only am container. So, 52GB on this additional box will do nothing > >> > >> I do not like both options. Is there a better way to configure > yarn/spark? > >> > >> > >> Alex >