neenu created SPARK-25743: ----------------------------- Summary: New executors are not launched for kubernetes spark thrift on deleting existing executors Key: SPARK-25743 URL: https://issues.apache.org/jira/browse/SPARK-25743 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 2.2.0 Environment: Physical lab configurations.
8 baremetal servers, Each 56 Cores, 384GB RAM, RHEL 7.4 Kernel : 3.10.0-862.9.1.el7.x86_64 redhat-release-server.x86_64 7.4-18.el7 Kubernetes info: Client Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"} Reporter: neenu Launched spark thrift in kubernetes cluster with dynamic allocation enabled. Configurations set : spark.executor.memory=35g spark.executor.cores=8 spark.dynamicAllocation.enabled=true spark.dynamicAllocation.executorIdleTimeout=10 spark.dynamicAllocation.cachedExecutorIdleTimeout=15 spark.driver.memory=10g spark.driver.cores=4 spark.sql.crossJoin.enabled=true spark.sql.starJoinOptimization=true spark.sql.codegen=true spark.rpc.numRetries=5 spark.rpc.retry.wait=5 spark.sql.broadcastTimeout=1200 spark.network.timeout=1800 spark.dynamicAllocation.maxExecutors=15 spark.kubernetes.allocation.batch.size=2 spark.kubernetes.allocation.batch.delay=9 spark.serializer=org.apache.spark.serializer.KryoSerializer spark.kubernetes.node.selector.is_control=false Tried to run TPCDS queries , on a 1TB parquet snappy data . Found that as the execution progress, the tasks are done by a single executor ( executor 53 ) and no new executors are getting spawned, even though there is enough resources to spawn more executors. Tried to manually delete the executor pod 53 and saw that no new executor has been spawned to replace the one which is running. Attcahed the -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org