Following from the above, I did some tests on this with leaving 1 VCPU out of 4 VCPUS to the OS on each node (Container & executor) in three node GKE cluster. The RAM allocated to each node was 16GB. I then set the initial container AND executor (memory 10 10% of RAM) and incremented these in steps of 10% from 10% to 50% and measured the time taken for the code to finish (from start to finish). Basically a simple
start_time = time.time() end_time = time.time() time_elapsed = (end_time - start_time) Which measured the completion time in seconds. The systematics were kept the same for all measurements and only one measurement taken at each memory setting .ie --conf spark.driver.memory= <Memory in MB> \ -conf spark.executor.memory= <Memory in MB> \ Memories were set the same for both the container and executors. The result I got were as follows: [image: image.png] So it appears that allocating 50-60% of RAM to both the driver and executors, provides an optimum value. Increasing the memory above 50% (say @60% = 9830MB) will result in the container never been created (stuck at pending), assuming it is trying to grab the memory as shown below k describe pod sparkbq-b506ac7dc521b667-driver -n spark Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 17m default-scheduler 0/3 nodes are available: 3 Insufficient memory. Warning FailedScheduling 17m default-scheduler 0/3 nodes are available: 3 Insufficient memory. Normal NotTriggerScaleUp 2m28s (x92 over 17m) cluster-autoscaler pod didn't trigger scale-up: HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Tue, 14 Dec 2021 at 11:28, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi, > > I have a three node k8s cluster (GKE) in Google cloud with E2 > standard machines that have 4 GB of system memory per VCPU giving 4 VPCU > and 16,384MB of RAM. > > An optimum sizing of the number of executors, CPU and memory allocation is > important here. These are the assumptions: > > 1. You want to fit exactly one Spark executor pod per Kubernetes node > 2. You should not starve the node OS, network etc from CPU usage > 3. If you have 3 nodes, one node should be allocated to the driver and > two nodes to the executors > 4. Regardless you want to execute the code ik8s as fast as possible > > I don't think with the current architecture, one can force the driver node > to accommodate both the driver plus one executor at the same time. I did > some tests and looked at the available discussions here > <https://spark.apache.org/docs/latest/running-on-kubernetes.html>and here > <https://www.datamechanics.co/blog-post/setting-up-managing-monitoring-spark-on-kubernetes> > . One can fine tune various parameters, but these seem to be fine > > --conf spark.executor.instances=2 \ > --conf spark.driver.cores=3 \ > --conf spark.executor.cores=3 \ > --conf spark.driver.memory=8000m \ > --conf spark.executor.memory=8000m \ > > What I am suggesting here is to leave one 1 VCPU out of 4 VCPUS to the OS > on each node. It is a safer bet to grab half of the memory available on > each node for the driver and executors. Your mileage varies because if you > try to allocate more memory, it will take longer for the driver and > executors to spin off (ContainerCreating), meaning that the execution time > will be longer. This could be offset if you are running a long job and you > care about allocating more available memory rather than the > ContainerCreation time. It would be interesting if others have done > similar configuration and their experience. > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >