If yarn has only 50 cores then it can support max 49 executors plus 1 driver application master.
Regards Sab On 24-Nov-2015 1:58 pm, "谢廷稳" <xieting...@gmail.com> wrote: > OK, yarn.scheduler.maximum-allocation-mb is 16384. > > I have ran it again, the command to run it is: > ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master > yarn-cluster - > -driver-memory 4g --executor-memory 8g lib/spark-examples*.jar 200 > > > >> >> >> 15/11/24 16:15:56 INFO yarn.ApplicationMaster: Registered signal handlers >> for [TERM, HUP, INT] >> >> 15/11/24 16:15:57 INFO yarn.ApplicationMaster: ApplicationAttemptId: >> appattempt_1447834709734_0120_000001 >> >> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: >> hdfs-test >> >> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: >> hdfs-test >> >> 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: >> authentication disabled; ui acls disabled; users with view permissions: >> Set(hdfs-test); users with modify permissions: Set(hdfs-test) >> >> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Starting the user application >> in a separate Thread >> >> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context >> initialization >> >> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context >> initialization ... >> 15/11/24 16:15:58 INFO spark.SparkContext: Running Spark version 1.5.0 >> >> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: >> hdfs-test >> >> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: >> hdfs-test >> >> 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: >> authentication disabled; ui acls disabled; users with view permissions: >> Set(hdfs-test); users with modify permissions: Set(hdfs-test) >> 15/11/24 16:15:58 INFO slf4j.Slf4jLogger: Slf4jLogger started >> 15/11/24 16:15:59 INFO Remoting: Starting remoting >> >> 15/11/24 16:15:59 INFO Remoting: Remoting started; listening on addresses >> :[akka.tcp://sparkDriver@X.X.X.X >> ] >> >> 15/11/24 16:15:59 INFO util.Utils: Successfully started service >> 'sparkDriver' on port 61904. >> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering MapOutputTracker >> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering BlockManagerMaster >> >> 15/11/24 16:15:59 INFO storage.DiskBlockManager: Created local directory at >> /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/blockmgr-33fbe6c4-5138-4eff-83b4-fb0c886667b7 >> >> 15/11/24 16:15:59 INFO storage.MemoryStore: MemoryStore started with >> capacity 1966.1 MB >> >> 15/11/24 16:15:59 INFO spark.HttpFileServer: HTTP File server directory is >> /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/spark-fbbfa2bd-6d30-421e-a634-4546134b3b5f/httpd-e31d7b8e-ca8f-400e-8b4b-d2993fb6f1d1 >> 15/11/24 16:15:59 INFO spark.HttpServer: Starting HTTP Server >> 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT >> 15/11/24 16:15:59 INFO server.AbstractConnector: Started >> SocketConnector@0.0.0.0:14692 >> >> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'HTTP file >> server' on port 14692. >> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator >> >> 15/11/24 16:15:59 INFO ui.JettyUtils: Adding filter: >> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter >> 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT >> 15/11/24 16:15:59 INFO server.AbstractConnector: Started >> SelectChannelConnector@0.0.0.0:15948 >> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'SparkUI' on >> port 15948. >> >> 15/11/24 16:15:59 INFO ui.SparkUI: Started SparkUI at X.X.X.X >> >> 15/11/24 16:15:59 INFO cluster.YarnClusterScheduler: Created >> YarnClusterScheduler >> >> 15/11/24 16:15:59 WARN metrics.MetricsSystem: Using default name >> DAGScheduler for source because >> spark.app.id is not set. >> >> 15/11/24 16:15:59 INFO util.Utils: Successfully started service >> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41830. >> 15/11/24 16:15:59 INFO netty.NettyBlockTransferService: Server created on >> 41830 >> >> 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Trying to register >> BlockManager >> >> 15/11/24 16:15:59 INFO storage.BlockManagerMasterEndpoint: Registering block >> manager X.X.X.X:41830 with 1966.1 MB RAM, BlockManagerId(driver, 10.12.30.2, >> 41830) >> >> 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Registered BlockManager >> 15/11/24 16:16:00 INFO scheduler.EventLoggingListener: Logging events to >> hdfs:///tmp/latest-spark-events/application_1447834709734_0120_1 >> >> 15/11/24 16:16:00 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: >> ApplicationMaster registered as >> AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#293602859]) >> >> 15/11/24 16:16:00 INFO client.RMProxy: Connecting to ResourceManager at >> X.X.X.X >> >> >> 15/11/24 16:16:00 INFO yarn.YarnRMClient: Registering the ApplicationMaster >> >> 15/11/24 16:16:00 INFO yarn.ApplicationMaster: Started progress reporter >> thread with (heartbeat : 3000, initial allocation : 200) intervals >> >> 15/11/24 16:16:29 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend >> is ready for scheduling beginning after waiting >> maxRegisteredResourcesWaitingTime: 30000(ms) >> >> 15/11/24 16:16:29 INFO cluster.YarnClusterScheduler: >> YarnClusterScheduler.postStartHook done >> >> 15/11/24 16:16:29 INFO spark.SparkContext: Starting job: reduce at >> SparkPi.scala:36 >> >> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Got job 0 (reduce at >> SparkPi.scala:36) with 200 output partitions >> >> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Final stage: ResultStage >> 0(reduce at SparkPi.scala:36) >> >> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Parents of final stage: List() >> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Missing parents: List() >> >> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Submitting ResultStage 0 >> (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing >> parents >> >> 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1888) called >> with curMem=0, maxMem=2061647216 >> >> 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0 stored as >> values in memory (estimated size 1888.0 B, free 1966.1 MB) >> 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1202) called >> with curMem=1888, maxMem=2061647216 >> >> 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0_piece0 stored >> as bytes in memory (estimated size 1202.0 B, free 1966.1 MB) >> >> 15/11/24 16:16:30 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in >> memory on X.X.X.X:41830 (size: 1202.0 B, free: 1966.1 MB) >> >> >> 15/11/24 16:16:30 INFO spark.SparkContext: Created broadcast 0 from >> broadcast at DAGScheduler.scala:861 >> >> 15/11/24 16:16:30 INFO scheduler.DAGScheduler: Submitting 200 missing tasks >> from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32) >> >> 15/11/24 16:16:30 INFO cluster.YarnClusterScheduler: Adding task set 0.0 >> with 200 tasks >> >> 15/11/24 16:16:45 WARN cluster.YarnClusterScheduler: Initial job has not >> accepted any resources; check your cluster UI to ensure that workers are >> registered and have sufficient resources >> >> 15/11/24 16:17:00 WARN cluster.YarnClusterScheduler: Initial job has not >> accepted any resources; check your cluster UI to ensure that workers are >> registered and have sufficient resources >> >> 15/11/24 16:17:15 WARN cluster.YarnClusterScheduler: Initial job has not >> accepted any resources; check your cluster UI to ensure that workers are >> registered and have sufficient resources >> >> 15/11/24 16:17:30 WARN cluster.YarnClusterScheduler: Initial job has not >> accepted any resources; check your cluster UI to ensure that workers are >> registered and have sufficient resources >> >> 15/11/24 16:17:45 WARN cluster.YarnClusterScheduler: Initial job has not >> accepted any resources; check your cluster UI to ensure that workers are >> registered and have sufficient resources >> >> 15/11/24 16:18:00 WARN cluster.YarnClusterScheduler: Initial job has not >> accepted any resources; check your cluster UI to ensure that workers are >> registered and have sufficient resources >> >> > 2015-11-24 15:14 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: > >> What about this configure in Yarn "yarn.scheduler.maximum-allocation-mb" >> >> I'm curious why 49 executors can be worked, but 50 failed. Would you >> provide your application master log, if container request is issued, there >> will be log like: >> >> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Will request 2 executor >> containers, each with 1 cores and 1408 MB memory including 384 MB overhead >> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host: Any, >> capability: <memory:1408, vCores:1>) >> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host: Any, >> capability: <memory:1408, vCores:1>) >> >> >> >> On Tue, Nov 24, 2015 at 2:56 PM, 谢廷稳 <xieting...@gmail.com> wrote: >> >>> OK, the YARN conf will be list in the following: >>> >>> yarn.nodemanager.resource.memory-mb:115200 >>> yarn.nodemanager.resource.cpu-vcores:50 >>> >>> I think the YARN resource is sufficient. In the previous letter I have >>> said that I think Spark application didn't request resources from YARN. >>> >>> Thanks >>> >>> 2015-11-24 14:30 GMT+08:00 cherrywayb...@gmail.com < >>> cherrywayb...@gmail.com>: >>> >>>> can you show your parameter values in your env ? >>>> yarn.nodemanager.resource.cpu-vcores >>>> yarn.nodemanager.resource.memory-mb >>>> >>>> ------------------------------ >>>> cherrywayb...@gmail.com >>>> >>>> >>>> *From:* 谢廷稳 <xieting...@gmail.com> >>>> *Date:* 2015-11-24 12:13 >>>> *To:* Saisai Shao <sai.sai.s...@gmail.com> >>>> *CC:* spark users <user@spark.apache.org> >>>> *Subject:* Re: A Problem About Running Spark 1.5 on YARN with Dynamic >>>> Alloction >>>> OK,the YARN cluster was used by myself,it have 6 node witch can run >>>> over 100 executor, and the YARN RM logs showed that the Spark application >>>> did not requested resource from it. >>>> >>>> Is this a bug? Should I create a JIRA for this problem? >>>> >>>> 2015-11-24 12:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>> >>>>> OK, so this looks like your Yarn cluster does not allocate containers >>>>> which you supposed should be 50. Does the yarn cluster have enough >>>>> resource >>>>> after allocating AM container, if not, that is the problem. >>>>> >>>>> The problem not lies in dynamic allocation from my guess of your >>>>> description. I said I'm OK with min and max executors to the same number. >>>>> >>>>> On Tue, Nov 24, 2015 at 11:54 AM, 谢廷稳 <xieting...@gmail.com> wrote: >>>>> >>>>>> Hi Saisai, >>>>>> I'm sorry for did not describe it clearly,YARN debug log said I have >>>>>> 50 executors,but ResourceManager showed that I only have 1 container for >>>>>> the AppMaster. >>>>>> >>>>>> I have checked YARN RM logs,after AppMaster changed state >>>>>> from ACCEPTED to RUNNING,it did not have log about this job any >>>>>> more.So,the >>>>>> problem is I did not have any executor but ExecutorAllocationManager >>>>>> think >>>>>> I have.Would you minding having a test in your cluster environment? >>>>>> Thanks, >>>>>> Weber >>>>>> >>>>>> 2015-11-24 11:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>>>> >>>>>>> I think this behavior is expected, since you already have 50 >>>>>>> executors launched, so no need to acquire additional executors. You >>>>>>> change >>>>>>> is not solid, it is just hiding the log. >>>>>>> >>>>>>> Again I think you should check the logs of Yarn and Spark to see if >>>>>>> executors are started correctly. Why resource is still not enough where >>>>>>> you >>>>>>> already have 50 executors. >>>>>>> >>>>>>> On Tue, Nov 24, 2015 at 10:48 AM, 谢廷稳 <xieting...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi SaiSai, >>>>>>>> I have changed "if (numExecutorsTarget >= maxNumExecutors)" to >>>>>>>> "if (numExecutorsTarget > maxNumExecutors)" of the first line in the >>>>>>>> ExecutorAllocationManager#addExecutors() and it rans well. >>>>>>>> In my opinion,when I was set minExecutors equals maxExecutors,when >>>>>>>> the first time to add Executors,numExecutorsTarget equals >>>>>>>> maxNumExecutors >>>>>>>> and it repeat printe "DEBUG ExecutorAllocationManager: Not adding >>>>>>>> executors because our current target total is already 50 (limit 50) >>>>>>>> ". >>>>>>>> Thanks >>>>>>>> Weber >>>>>>>> >>>>>>>> 2015-11-23 21:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>>>>>> >>>>>>>>> Hi Tingwen, >>>>>>>>> >>>>>>>>> Would you minding sharing your changes in >>>>>>>>> ExecutorAllocationManager#addExecutors(). >>>>>>>>> >>>>>>>>> From my understanding and test, dynamic allocation can be worked >>>>>>>>> when you set the min to max number of executors to the same number. >>>>>>>>> >>>>>>>>> Please check your Spark and Yarn log to make sure the executors >>>>>>>>> are correctly started, the warning log means currently resource is not >>>>>>>>> enough to submit tasks. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Saisai >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Nov 23, 2015 at 8:41 PM, 谢廷稳 <xieting...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> I ran a SparkPi on YARN with Dynamic Allocation enabled and set >>>>>>>>>> spark.dynamicAllocation.maxExecutors >>>>>>>>>> equals >>>>>>>>>> spark.dynamicAllocation.minExecutors,then I submit an application >>>>>>>>>> using: >>>>>>>>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi >>>>>>>>>> --master yarn-cluster --driver-memory 4g --executor-memory 8g >>>>>>>>>> lib/spark-examples*.jar 200 >>>>>>>>>> >>>>>>>>>> then, this application was submitted successfully, but the >>>>>>>>>> AppMaster always saying “15/11/23 20:13:08 WARN >>>>>>>>>> cluster.YarnClusterScheduler: Initial job has not accepted any >>>>>>>>>> resources; >>>>>>>>>> check your cluster UI to ensure that workers are registered and have >>>>>>>>>> sufficient resources” >>>>>>>>>> and when I open DEBUG,I found “15/11/23 20:24:00 DEBUG >>>>>>>>>> ExecutorAllocationManager: Not adding executors because our current >>>>>>>>>> target >>>>>>>>>> total is already 50 (limit 50)” in the console. >>>>>>>>>> >>>>>>>>>> I have fixed it by modifying code in >>>>>>>>>> ExecutorAllocationManager.addExecutors,Does this a bug or it was >>>>>>>>>> designed >>>>>>>>>> that we can’t set maxExecutors equals minExecutors? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Weber >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >