pan3793 commented on issue #4629:
URL: https://github.com/apache/kyuubi/issues/4629#issuecomment-1491335588

   > Also it would be great if you can share configuration properties you use 
for Spark to find best minimal config.
   
   We maintained an internal Spark w/ additional K8s enhancement patches, e.g. 
external log service integration, window-based executor failure detection, etc. 
except for that, there are a few different configurations compared to Spark on 
Yarn.
   
   There are some configurations I think are important.
   
   In Yarn, usually we set it to 0 since many drivers may share the same 
NodeManager, but K8s disallow listening on the 0 port, and each Driver has its 
own Pod IP
   ```
   spark.ui.port=4040
   ```
   
   Enable Prometheus metrics
   ```
   
spark.metrics.conf.*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
   spark.metrics.conf.*.sink.prometheusServlet.path=/metrics/prometheus
   ```
   
   Force terminate Driver Pod when encountering OOM in case Driver Pod hangs 
forever.
   ```
   spark.driver.extraJavaOptions=-XX:OnOutOfMemoryError="kill -9 %p"
   ```
   Based on our internal testing, since Spark 3.2, the zstd achieves the same 
perf and reduces ~50% disk usage for shuffle data compared w/ the default lz4
   ```
   spark.io.compression.codec=zstd
   ```
   We disable the Netty direct memory usage totally, to skip the Executor 
memory check on starting, see https://github.com/apache/spark/pull/38901 for 
details.
   ```
   spark.network.io.preferDirectBufs=false
   spark.shuffle.io.preferDirectBufs=false
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to