I see. Good to learn the interaction between spark.task.cpus and spark.executor.cores. But am I right to say that PR #20553 can be still used as an additional knob on top of those two? Say a user wants 1.5 core per executor from Kubernetes, not the rounded up integer value 2?
> A relevant question is should Spark on Kubernetes really be opinionated on how to set the cpu request and limit and even try to determine this automatically? Personally, I don't see how this can be auto-determined at all. I think the best we can do is to come up with sensible default values for the most common case, and provide and well-document other knobs for edge cases. Thanks, Kimoon On Fri, Mar 30, 2018 at 12:37 PM, Yinan Li <liyinan...@gmail.com> wrote: > PR #20553 <https://github.com/apache/spark/pull/20553> is more for > allowing users to use a fractional value for cpu requests. The existing > spark.executor.cores is sufficient for specifying more than one cpus. > > > One way to solve this could be to request more than 1 core from > Kubernetes per task. The exact amount we should request is unclear to me > (it largely depends on how many threads actually get spawned for a task). > > A good indication is spark.task.cpus, and on average how many tasks are > expected to run by a single executor at any point in time. If each executor > is only expected to run one task at most at any point in time, > spark.executor.cores can be set to be equal to spark.task.cpus. > > A relevant question is should Spark on Kubernetes really be opinionated on > how to set the cpu request and limit and even try to determine this > automatically? > > On Fri, Mar 30, 2018 at 11:40 AM, Kimoon Kim <kim...@pepperdata.com> > wrote: > >> > Instead of requesting `[driver,executor].memory`, we should just >> request `[driver,executor].memory + [driver,executor].memoryOverhead `. I >> think this case is a bit clearer than the CPU case, so I went ahead and >> filed an issue <https://issues.apache.org/jira/browse/SPARK-23825> with >> more details and made a PR <https://github.com/apache/spark/pull/20943>. >> >> I think this suggestion makes sense. >> >> > One way to solve this could be to request more than 1 core from >> Kubernetes per task. The exact amount we should request is unclear to me >> (it largely depends on how many threads actually get spawned for a task). >> >> I wonder if this is being addressed by PR #20553 >> <https://github.com/apache/spark/pull/20553> written by Yinan. Yinan? >> >> Thanks, >> Kimoon >> >> On Thu, Mar 29, 2018 at 5:14 PM, David Vogelbacher < >> dvogelbac...@palantir.com> wrote: >> >>> Hi, >>> >>> >>> >>> At the moment driver and executor pods are created using the following >>> requests and limits: >>> >>> >>> >>> *CPU* >>> >>> *Memory* >>> >>> *Request* >>> >>> [driver,executor].cores >>> >>> [driver,executor].memory >>> >>> *Limit* >>> >>> Unlimited (but can be specified using spark.[driver,executor].cores) >>> >>> [driver,executor].memory + [driver,executor].memoryOverhead >>> >>> >>> >>> Specifying the requests like this leads to problems if the pods only get >>> the requested amount of resources and nothing of the optional (limit) >>> resources, as it can happen in a fully utilized cluster. >>> >>> >>> >>> *For memory:* >>> >>> Let’s say we have a node with 100GiB memory and 5 pods with 20 GiB >>> memory and 5 GiB memoryOverhead. >>> >>> At the beginning all 5 pods use 20 GiB of memory and all is well. If a >>> pod then starts using its overhead memory it will get killed as there is no >>> more memory available, even though we told spark >>> >>> that it can use 25 GiB of memory. >>> >>> >>> >>> Instead of requesting `[driver,executor].memory`, we should just request >>> `[driver,executor].memory + [driver,executor].memoryOverhead `. >>> >>> I think this case is a bit clearer than the CPU case, so I went ahead >>> and filed an issue <https://issues.apache.org/jira/browse/SPARK-23825> >>> with more details and made a PR >>> <https://github.com/apache/spark/pull/20943>. >>> >>> >>> >>> *For CPU:* >>> >>> As it turns out, there can be performance problems if we only have >>> `executor.cores` available (which means we have one core per task). This >>> was raised here >>> <https://github.com/apache-spark-on-k8s/spark/issues/352> and is the >>> reason that the cpu limit was set to unlimited. >>> >>> This issue stems from the fact that in general there will be more than >>> one thread per task, resulting in performance impacts if there is only one >>> core available. >>> >>> However, I am not sure that just setting the limit to unlimited is the >>> best solution because it means that even if the Kubernetes cluster can >>> perfectly satisfy the resource requests, performance might be very bad. >>> >>> >>> >>> I think we should guarantee that an executor is able to do its work well >>> (without performance issues or getting killed - as could happen in the >>> memory case) with the resources it gets guaranteed from Kubernetes. >>> >>> >>> >>> One way to solve this could be to request more than 1 core from >>> Kubernetes per task. The exact amount we should request is unclear to me >>> (it largely depends on how many threads actually get spawned for a task). >>> >>> We would need to find a way to determine this somehow automatically or >>> at least come up with a better default value than 1 core per task. >>> >>> >>> >>> Does somebody have ideas or thoughts on how to solve this best? >>> >>> >>> >>> Best, >>> >>> David >>> >> >> >