In summary, it looks like a combination of David's (#20943 <https://github.com/apache/spark/pull/20943>) and Yinan's PR (#20553 <https://github.com/apache/spark/pull/20553>) are good solutions here. Agreed on the importance of requesting memoryoverhead up front.
I'm also wondering if we should support running in other QoS classes - https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#qos-classes, like maybe best-effort as well i.e. launching in a configuration that has neither the limit nor the request specified. I haven't seen a use-case but I can imagine this is a way for people to achieve better utilization with low priority long-running jobs. On Fri, Mar 30, 2018 at 3:06 PM Yinan Li <liyinan...@gmail.com> wrote: > Yes, the PR allows you to set say 1.5. The New configuration property > defaults to spark.executor.cores, which defaults to 1. > > On Fri, Mar 30, 2018, 3:03 PM Kimoon Kim <kim...@pepperdata.com> wrote: > >> David, glad it helped! And thanks for your clear example. >> >> > The only remaining question would then be what a sensible default for >> *spark.kubernetes.executor.cores *would be. Seeing that I wanted more >> than 1 and Yinan wants less, leaving it at 1 night be best. >> >> 1 as default SGTM. >> >> Thanks, >> Kimoon >> >> On Fri, Mar 30, 2018 at 1:38 PM, David Vogelbacher < >> dvogelbac...@palantir.com> wrote: >> >>> Thanks for linking that PR Kimoon. >>> >>> >>> It actually does mostly address the issue I was referring to. As the >>> issue <https://github.com/apache-spark-on-k8s/spark/issues/352> I >>> linked in my first email states, one physical cpu might not be enough to >>> execute a task in a performant way. >>> >>> >>> >>> So if I set *spark.executor.cores=1* and *spark.task.cpus=1* , I will >>> get 1 core from Kubernetes and execute one task per Executor and run into >>> performance problems. >>> >>> Being able to specify `spark.kubernetes.executor.cores=1.2` would fix >>> the issue (1.2 is just an example). >>> >>> I am curious as to why you, Yinan, would want to use this property to >>> request less than 1 physical cpu (that is how it sounds to me on the PR). >>> >>> Do you have testing that indicates that less than 1 physical CPU is >>> enough for executing tasks? >>> >>> >>> >>> In the end it boils down to the question proposed by Yinan: >>> >>> > A relevant question is should Spark on Kubernetes really be >>> opinionated on how to set the cpu request and limit and even try to >>> determine this automatically? >>> >>> >>> >>> And I completely agree with your answer Kimoon, we should provide >>> sensible defaults and make it configurable, as Yinan’s PR does. >>> >>> The only remaining question would then be what a sensible default for >>> *spark.kubernetes.executor.cores >>> *would be. Seeing that I wanted more than 1 and Yinan wants less, >>> leaving it at 1 night be best. >>> >>> >>> >>> Thanks, >>> >>> David >>> >>> >>> >>> *From: *Kimoon Kim <kim...@pepperdata.com> >>> *Date: *Friday, March 30, 2018 at 4:28 PM >>> *To: *Yinan Li <liyinan...@gmail.com> >>> *Cc: *David Vogelbacher <dvogelbac...@palantir.com>, " >>> dev@spark.apache.org" <dev@spark.apache.org> >>> *Subject: *Re: [Kubernetes] Resource requests and limits for Driver and >>> Executor Pods >>> >>> >>> >>> I see. Good to learn the interaction between spark.task.cpus and >>> spark.executor.cores. But am I right to say that PR #20553 can be still >>> used as an additional knob on top of those two? Say a user wants 1.5 core >>> per executor from Kubernetes, not the rounded up integer value 2? >>> >>> >>> >>> > A relevant question is should Spark on Kubernetes really be >>> opinionated on how to set the cpu request and limit and even try to >>> determine this automatically? >>> >>> >>> >>> Personally, I don't see how this can be auto-determined at all. I think >>> the best we can do is to come up with sensible default values for the most >>> common case, and provide and well-document other knobs for edge cases. >>> >>> >>> Thanks, >>> >>> Kimoon >>> >>> >>> >>> On Fri, Mar 30, 2018 at 12:37 PM, Yinan Li <liyinan...@gmail.com> wrote: >>> >>> PR #20553 [github.com] >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_pull_20553&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=BFXcJr3WTIvmlY-gtaiCO5QK4bLix2sgwDDpPfrZKoE&m=TrCA4oIVKyN3M_ExqpHr7bbhi14uvoEaspPwclIJI4M&s=jqIG5lO5tnV3K3SDPPxw2bEHs0i6cltoaLh8K39JTTQ&e=> >>> is >>> more for allowing users to use a fractional value for cpu requests. The >>> existing spark.executor.cores is sufficient for specifying more than one >>> cpus. >>> >>> >>> >>> > One way to solve this could be to request more than 1 core from >>> Kubernetes per task. The exact amount we should request is unclear to me >>> (it largely depends on how many threads actually get spawned for a task). >>> >>> A good indication is spark.task.cpus, and on average how many tasks are >>> expected to run by a single executor at any point in time. If each executor >>> is only expected to run one task at most at any point in time, >>> spark.executor.cores can be set to be equal to spark.task.cpus. >>> >>> A relevant question is should Spark on Kubernetes really be opinionated >>> on how to set the cpu request and limit and even try to determine this >>> automatically? >>> >>> >>> >>> On Fri, Mar 30, 2018 at 11:40 AM, Kimoon Kim <kim...@pepperdata.com> >>> wrote: >>> >>> > Instead of requesting `[driver,executor].memory`, we should just >>> request `[driver,executor].memory + [driver,executor].memoryOverhead `. I >>> think this case is a bit clearer than the CPU case, so I went ahead and >>> filed an issue [issues.apache.org] >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D23825&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=BFXcJr3WTIvmlY-gtaiCO5QK4bLix2sgwDDpPfrZKoE&m=TrCA4oIVKyN3M_ExqpHr7bbhi14uvoEaspPwclIJI4M&s=hA8h-KIeJ_6Khjx1JzFZF55ZH3GnSrB4HEkHc1I-yBc&e=> >>> with >>> more details and made a PR [github.com] >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_pull_20943&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=BFXcJr3WTIvmlY-gtaiCO5QK4bLix2sgwDDpPfrZKoE&m=TrCA4oIVKyN3M_ExqpHr7bbhi14uvoEaspPwclIJI4M&s=qZFhxef7FgsA9UfijbVtKAIDuchcTf9wQxYIKL87SsU&e=> >>> . >>> >>> I think this suggestion makes sense. >>> >>> >>> >>> > One way to solve this could be to request more than 1 core from >>> Kubernetes per task. The exact amount we should request is unclear to me >>> (it largely depends on how many threads actually get spawned for a task). >>> >>> >>> >>> I wonder if this is being addressed by PR #20553 [github.com] >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_pull_20553&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=BFXcJr3WTIvmlY-gtaiCO5QK4bLix2sgwDDpPfrZKoE&m=TrCA4oIVKyN3M_ExqpHr7bbhi14uvoEaspPwclIJI4M&s=jqIG5lO5tnV3K3SDPPxw2bEHs0i6cltoaLh8K39JTTQ&e=> >>> written >>> by Yinan. Yinan? >>> >>> >>> Thanks, >>> >>> Kimoon >>> >>> >>> >>> On Thu, Mar 29, 2018 at 5:14 PM, David Vogelbacher < >>> dvogelbac...@palantir.com> wrote: >>> >>> Hi, >>> >>> >>> >>> At the moment driver and executor pods are created using the following >>> requests and limits: >>> >>> >>> >>> *CPU* >>> >>> *Memory* >>> >>> *Request* >>> >>> [driver,executor].cores >>> >>> [driver,executor].memory >>> >>> *Limit* >>> >>> Unlimited (but can be specified using spark.[driver,executor].cores) >>> >>> [driver,executor].memory + [driver,executor].memoryOverhead >>> >>> >>> >>> Specifying the requests like this leads to problems if the pods only get >>> the requested amount of resources and nothing of the optional (limit) >>> resources, as it can happen in a fully utilized cluster. >>> >>> >>> >>> *For memory:* >>> >>> Let’s say we have a node with 100GiB memory and 5 pods with 20 GiB >>> memory and 5 GiB memoryOverhead. >>> >>> At the beginning all 5 pods use 20 GiB of memory and all is well. If a >>> pod then starts using its overhead memory it will get killed as there is no >>> more memory available, even though we told spark >>> >>> that it can use 25 GiB of memory. >>> >>> >>> >>> Instead of requesting `[driver,executor].memory`, we should just request >>> `[driver,executor].memory + [driver,executor].memoryOverhead `. >>> >>> I think this case is a bit clearer than the CPU case, so I went ahead >>> and filed an issue [issues.apache.org] >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D23825&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=BFXcJr3WTIvmlY-gtaiCO5QK4bLix2sgwDDpPfrZKoE&m=TrCA4oIVKyN3M_ExqpHr7bbhi14uvoEaspPwclIJI4M&s=hA8h-KIeJ_6Khjx1JzFZF55ZH3GnSrB4HEkHc1I-yBc&e=> >>> with more details and made a PR [github.com] >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_pull_20943&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=BFXcJr3WTIvmlY-gtaiCO5QK4bLix2sgwDDpPfrZKoE&m=TrCA4oIVKyN3M_ExqpHr7bbhi14uvoEaspPwclIJI4M&s=qZFhxef7FgsA9UfijbVtKAIDuchcTf9wQxYIKL87SsU&e=> >>> . >>> >>> >>> >>> *For CPU:* >>> >>> As it turns out, there can be performance problems if we only have >>> `executor.cores` available (which means we have one core per task). This >>> was raised here [github.com] >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache-2Dspark-2Don-2Dk8s_spark_issues_352&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=BFXcJr3WTIvmlY-gtaiCO5QK4bLix2sgwDDpPfrZKoE&m=TrCA4oIVKyN3M_ExqpHr7bbhi14uvoEaspPwclIJI4M&s=uTMrl29jkJRlc_N1S_6lvwCjkovzrsan8zIczzxDZGM&e=> >>> and is the reason that the cpu limit was set to unlimited. >>> >>> This issue stems from the fact that in general there will be more than >>> one thread per task, resulting in performance impacts if there is only one >>> core available. >>> >>> However, I am not sure that just setting the limit to unlimited is the >>> best solution because it means that even if the Kubernetes cluster can >>> perfectly satisfy the resource requests, performance might be very bad. >>> >>> >>> >>> I think we should guarantee that an executor is able to do its work well >>> (without performance issues or getting killed - as could happen in the >>> memory case) with the resources it gets guaranteed from Kubernetes. >>> >>> >>> >>> One way to solve this could be to request more than 1 core from >>> Kubernetes per task. The exact amount we should request is unclear to me >>> (it largely depends on how many threads actually get spawned for a task). >>> >>> We would need to find a way to determine this somehow automatically or >>> at least come up with a better default value than 1 core per task. >>> >>> >>> >>> Does somebody have ideas or thoughts on how to solve this best? >>> >>> >>> >>> Best, >>> >>> David >>> >>> >>> >>> >>> >>> >>> >> >> -- Anirudh Ramanathan