[ https://issues.apache.org/jira/browse/SPARK-25743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651514#comment-16651514 ]
neenu commented on SPARK-25743: ------------------------------- Also attached the tpcds query list executed [^query_0_correct.sql] > New executors are not launched for kubernetes spark thrift on deleting > existing executors > ------------------------------------------------------------------------------------------ > > Key: SPARK-25743 > URL: https://issues.apache.org/jira/browse/SPARK-25743 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 2.2.0 > Environment: Physical lab configurations. > 8 baremetal servers, > Each 56 Cores, 384GB RAM, RHEL 7.4 > Kernel : 3.10.0-862.9.1.el7.x86_64 > redhat-release-server.x86_64 7.4-18.el7 > > > Kubernetes info: > Client Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2", > GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", > BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc", > Platform:"linux/amd64"} > Server Version: version.Info\{Major:"1", Minor:"10", GitVersion:"v1.10.2", > GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", > BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", > Platform:"linux/amd64"} > Reporter: neenu > Priority: Major > Attachments: driver.log > > > Launched spark thrift in kubernetes cluster with dynamic allocation enabled. > Configurations set : > spark.executor.memory=35g > spark.executor.cores=8 > spark.dynamicAllocation.enabled=true > spark.dynamicAllocation.executorIdleTimeout=10 > spark.dynamicAllocation.cachedExecutorIdleTimeout=15 > spark.driver.memory=10g > spark.driver.cores=4 > spark.sql.crossJoin.enabled=true > spark.sql.starJoinOptimization=true > spark.sql.codegen=true > spark.rpc.numRetries=5 > spark.rpc.retry.wait=5 > spark.sql.broadcastTimeout=1200 > spark.network.timeout=1800 > spark.dynamicAllocation.maxExecutors=15 > spark.kubernetes.allocation.batch.size=2 > spark.kubernetes.allocation.batch.delay=9 > spark.serializer=org.apache.spark.serializer.KryoSerializer > spark.kubernetes.node.selector.is_control=false > Tried to run TPCDS queries , on a 1TB parquet snappy data . > Found that as the execution progress, the tasks are done by a single executor > ( executor 53 ) and no new executors are getting spawned, even though there > is enough resources to spawn more executors. > > Tried to manually delete the executor pod 53 and saw that no new executor has > been spawned to replace the one which is running. > Attcahed the -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org