neenu created SPARK-25679:
-----------------------------
Summary: OOM Killed observed for spark thrift executors with
dynamic allocation enabled
Key: SPARK-25679
URL: https://issues.apache.org/jira/browse/SPARK-25679
Project: Spark
Issue Type: Bug
Components: Kubernetes
Affects Versions: 2.2.0
Environment: Physical ab configurations.
8 baremetal servers,
Each 56 Cores, 384GB RAM, RHEL 7.4
Kernel : 3.10.0-862.9.1.el7.x86_64
redhat-release-server.x86_64 7.4-18.el7
Spark Thrift server configurations
driver memory :10GB
driver core :4
executor memory :35GB
executor core :8
Kubernetes info:
Client Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2",
GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean",
BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc",
Platform:"linux/amd64"}
Server Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2",
GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean",
BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc",
Platform:"linux/amd64"}
Reporter: neenu
Spark thrift executors are getting killed with OOM error , where dynamic
allocation is enabled.
Tried to run TPCDS queries , on a 1TB parquet snappy data , where the executor
memory was set as 35GB and cores as 8. The max executors set was 100. Saw
around 30 executors running at a time.
Since dynamic allocation is enabled , where spark decides the no:of executors
being spawned , should there be OOM errors ? Couldn't the spark decide to
launch more executors to avoid the same ?
Note : There was enough cluster resources available to launch more executors if
needed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]