Re: Re: A Problem About Running Spark 1.5 on YARN with Dynamic Alloction

Sabarish Sasidharan Tue, 24 Nov 2015 00:43:15 -0800

If yarn has only 50 cores then it can support max 49 executors plus 1
driver application master.


Regards
Sab
On 24-Nov-2015 1:58 pm, "谢廷稳" <xieting...@gmail.com> wrote:

> OK, yarn.scheduler.maximum-allocation-mb is 16384.
>
> I have ran it again, the command to run it is:
> ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
> yarn-cluster -
> -driver-memory 4g  --executor-memory 8g lib/spark-examples*.jar 200
>
>
>
>>
>>
>> 15/11/24 16:15:56 INFO yarn.ApplicationMaster: Registered signal handlers 
>> for [TERM, HUP, INT]
>>
>> 15/11/24 16:15:57 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
>> appattempt_1447834709734_0120_000001
>>
>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: 
>> hdfs-test
>>
>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: 
>> hdfs-test
>>
>> 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: 
>> authentication disabled; ui acls disabled; users with view permissions: 
>> Set(hdfs-test); users with modify permissions: Set(hdfs-test)
>>
>> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Starting the user application 
>> in a separate Thread
>>
>> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context 
>> initialization
>>
>> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context 
>> initialization ...
>> 15/11/24 16:15:58 INFO spark.SparkContext: Running Spark version 1.5.0
>>
>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: 
>> hdfs-test
>>
>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: 
>> hdfs-test
>>
>> 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: 
>> authentication disabled; ui acls disabled; users with view permissions: 
>> Set(hdfs-test); users with modify permissions: Set(hdfs-test)
>> 15/11/24 16:15:58 INFO slf4j.Slf4jLogger: Slf4jLogger started
>> 15/11/24 16:15:59 INFO Remoting: Starting remoting
>>
>> 15/11/24 16:15:59 INFO Remoting: Remoting started; listening on addresses 
>> :[akka.tcp://sparkDriver@X.X.X.X
>> ]
>>
>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 
>> 'sparkDriver' on port 61904.
>> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering MapOutputTracker
>> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering BlockManagerMaster
>>
>> 15/11/24 16:15:59 INFO storage.DiskBlockManager: Created local directory at 
>> /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/blockmgr-33fbe6c4-5138-4eff-83b4-fb0c886667b7
>>
>> 15/11/24 16:15:59 INFO storage.MemoryStore: MemoryStore started with 
>> capacity 1966.1 MB
>>
>> 15/11/24 16:15:59 INFO spark.HttpFileServer: HTTP File server directory is 
>> /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/spark-fbbfa2bd-6d30-421e-a634-4546134b3b5f/httpd-e31d7b8e-ca8f-400e-8b4b-d2993fb6f1d1
>> 15/11/24 16:15:59 INFO spark.HttpServer: Starting HTTP Server
>> 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> 15/11/24 16:15:59 INFO server.AbstractConnector: Started
>> SocketConnector@0.0.0.0:14692
>>
>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'HTTP file 
>> server' on port 14692.
>> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator
>>
>> 15/11/24 16:15:59 INFO ui.JettyUtils: Adding filter: 
>> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
>> 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> 15/11/24 16:15:59 INFO server.AbstractConnector: Started
>> SelectChannelConnector@0.0.0.0:15948
>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'SparkUI' on 
>> port 15948.
>>
>> 15/11/24 16:15:59 INFO ui.SparkUI: Started SparkUI at X.X.X.X
>>
>> 15/11/24 16:15:59 INFO cluster.YarnClusterScheduler: Created 
>> YarnClusterScheduler
>>
>> 15/11/24 16:15:59 WARN metrics.MetricsSystem: Using default name 
>> DAGScheduler for source because
>> spark.app.id is not set.
>>
>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 
>> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41830.
>> 15/11/24 16:15:59 INFO netty.NettyBlockTransferService: Server created on 
>> 41830
>>
>> 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Trying to register 
>> BlockManager
>>
>> 15/11/24 16:15:59 INFO storage.BlockManagerMasterEndpoint: Registering block 
>> manager X.X.X.X:41830 with 1966.1 MB RAM, BlockManagerId(driver, 10.12.30.2, 
>> 41830)
>>
>> 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Registered BlockManager
>> 15/11/24 16:16:00 INFO scheduler.EventLoggingListener: Logging events to 
>> hdfs:///tmp/latest-spark-events/application_1447834709734_0120_1
>>
>> 15/11/24 16:16:00 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: 
>> ApplicationMaster registered as 
>> AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#293602859])
>>
>> 15/11/24 16:16:00 INFO client.RMProxy: Connecting to ResourceManager at 
>> X.X.X.X
>>
>>
>> 15/11/24 16:16:00 INFO yarn.YarnRMClient: Registering the ApplicationMaster
>>
>> 15/11/24 16:16:00 INFO yarn.ApplicationMaster: Started progress reporter 
>> thread with (heartbeat : 3000, initial allocation : 200) intervals
>>
>> 15/11/24 16:16:29 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend 
>> is ready for scheduling beginning after waiting 
>> maxRegisteredResourcesWaitingTime: 30000(ms)
>>
>> 15/11/24 16:16:29 INFO cluster.YarnClusterScheduler: 
>> YarnClusterScheduler.postStartHook done
>>
>> 15/11/24 16:16:29 INFO spark.SparkContext: Starting job: reduce at 
>> SparkPi.scala:36
>>
>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Got job 0 (reduce at 
>> SparkPi.scala:36) with 200 output partitions
>>
>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Final stage: ResultStage 
>> 0(reduce at SparkPi.scala:36)
>>
>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Parents of final stage: List()
>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Missing parents: List()
>>
>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Submitting ResultStage 0 
>> (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing 
>> parents
>>
>> 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1888) called 
>> with curMem=0, maxMem=2061647216
>>
>> 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0 stored as 
>> values in memory (estimated size 1888.0 B, free 1966.1 MB)
>> 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1202) called 
>> with curMem=1888, maxMem=2061647216
>>
>> 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0_piece0 stored 
>> as bytes in memory (estimated size 1202.0 B, free 1966.1 MB)
>>
>> 15/11/24 16:16:30 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
>> memory on X.X.X.X:41830 (size: 1202.0 B, free: 1966.1 MB)
>>
>>
>> 15/11/24 16:16:30 INFO spark.SparkContext: Created broadcast 0 from 
>> broadcast at DAGScheduler.scala:861
>>
>> 15/11/24 16:16:30 INFO scheduler.DAGScheduler: Submitting 200 missing tasks 
>> from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32)
>>
>> 15/11/24 16:16:30 INFO cluster.YarnClusterScheduler: Adding task set 0.0 
>> with 200 tasks
>>
>> 15/11/24 16:16:45 WARN cluster.YarnClusterScheduler: Initial job has not 
>> accepted any resources; check your cluster UI to ensure that workers are 
>> registered and have sufficient resources
>>
>> 15/11/24 16:17:00 WARN cluster.YarnClusterScheduler: Initial job has not 
>> accepted any resources; check your cluster UI to ensure that workers are 
>> registered and have sufficient resources
>>
>> 15/11/24 16:17:15 WARN cluster.YarnClusterScheduler: Initial job has not 
>> accepted any resources; check your cluster UI to ensure that workers are 
>> registered and have sufficient resources
>>
>> 15/11/24 16:17:30 WARN cluster.YarnClusterScheduler: Initial job has not 
>> accepted any resources; check your cluster UI to ensure that workers are 
>> registered and have sufficient resources
>>
>> 15/11/24 16:17:45 WARN cluster.YarnClusterScheduler: Initial job has not 
>> accepted any resources; check your cluster UI to ensure that workers are 
>> registered and have sufficient resources
>>
>> 15/11/24 16:18:00 WARN cluster.YarnClusterScheduler: Initial job has not 
>> accepted any resources; check your cluster UI to ensure that workers are 
>> registered and have sufficient resources
>>
>>
> 2015-11-24 15:14 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>
>> What about this configure in Yarn "yarn.scheduler.maximum-allocation-mb"
>>
>> I'm curious why 49 executors can be worked, but 50 failed. Would you
>> provide your application master log, if container request is issued, there
>> will be log like:
>>
>> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Will request 2 executor
>> containers, each with 1 cores and 1408 MB memory including 384 MB overhead
>> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host: Any,
>> capability: <memory:1408, vCores:1>)
>> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host: Any,
>> capability: <memory:1408, vCores:1>)
>>
>>
>>
>> On Tue, Nov 24, 2015 at 2:56 PM, 谢廷稳 <xieting...@gmail.com> wrote:
>>
>>> OK,  the YARN conf will be list in the following:
>>>
>>> yarn.nodemanager.resource.memory-mb:115200
>>> yarn.nodemanager.resource.cpu-vcores:50
>>>
>>> I think the YARN resource is sufficient. In the previous letter I have
>>> said that I think Spark application didn't request resources from YARN.
>>>
>>> Thanks
>>>
>>> 2015-11-24 14:30 GMT+08:00 cherrywayb...@gmail.com <
>>> cherrywayb...@gmail.com>:
>>>
>>>> can you show your parameter values in your env ?
>>>>     yarn.nodemanager.resource.cpu-vcores
>>>>     yarn.nodemanager.resource.memory-mb
>>>>
>>>> ------------------------------
>>>> cherrywayb...@gmail.com
>>>>
>>>>
>>>> *From:* 谢廷稳 <xieting...@gmail.com>
>>>> *Date:* 2015-11-24 12:13
>>>> *To:* Saisai Shao <sai.sai.s...@gmail.com>
>>>> *CC:* spark users <user@spark.apache.org>
>>>> *Subject:* Re: A Problem About Running Spark 1.5 on YARN with Dynamic
>>>> Alloction
>>>> OK,the YARN cluster was used by myself,it have 6 node witch can run
>>>> over 100 executor, and the YARN RM logs showed that the Spark application
>>>> did not requested resource from it.
>>>>
>>>> Is this a bug? Should I create a JIRA for this problem?
>>>>
>>>> 2015-11-24 12:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>>>
>>>>> OK, so this looks like your Yarn cluster  does not allocate containers
>>>>> which you supposed should be 50. Does the yarn cluster have enough 
>>>>> resource
>>>>> after allocating AM container, if not, that is the problem.
>>>>>
>>>>> The problem not lies in dynamic allocation from my guess of your
>>>>> description. I said I'm OK with min and max executors to the same number.
>>>>>
>>>>> On Tue, Nov 24, 2015 at 11:54 AM, 谢廷稳 <xieting...@gmail.com> wrote:
>>>>>
>>>>>> Hi Saisai,
>>>>>> I'm sorry for did not describe it clearly,YARN debug log said I have
>>>>>> 50 executors,but ResourceManager showed that I only have 1 container for
>>>>>> the AppMaster.
>>>>>>
>>>>>> I have checked YARN RM logs,after AppMaster changed state
>>>>>> from ACCEPTED to RUNNING,it did not have log about this job any 
>>>>>> more.So,the
>>>>>> problem is I did not have any executor but ExecutorAllocationManager 
>>>>>> think
>>>>>> I have.Would you minding having a test in your cluster environment?
>>>>>> Thanks,
>>>>>> Weber
>>>>>>
>>>>>> 2015-11-24 11:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>>>>>
>>>>>>> I think this behavior is expected, since you already have 50
>>>>>>> executors launched, so no need to acquire additional executors. You 
>>>>>>> change
>>>>>>> is not solid, it is just hiding the log.
>>>>>>>
>>>>>>> Again I think you should check the logs of Yarn and Spark to see if
>>>>>>> executors are started correctly. Why resource is still not enough where 
>>>>>>> you
>>>>>>> already have 50 executors.
>>>>>>>
>>>>>>> On Tue, Nov 24, 2015 at 10:48 AM, 谢廷稳 <xieting...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi SaiSai,
>>>>>>>> I have changed  "if (numExecutorsTarget >= maxNumExecutors)"  to
>>>>>>>> "if (numExecutorsTarget > maxNumExecutors)" of the first line in the
>>>>>>>> ExecutorAllocationManager#addExecutors() and it rans well.
>>>>>>>> In my opinion,when I was set minExecutors equals maxExecutors,when
>>>>>>>> the first time to add Executors,numExecutorsTarget equals 
>>>>>>>> maxNumExecutors
>>>>>>>> and it repeat printe "DEBUG ExecutorAllocationManager: Not adding
>>>>>>>> executors because our current target total is already 50 (limit 50)
>>>>>>>> ".
>>>>>>>> Thanks
>>>>>>>> Weber
>>>>>>>>
>>>>>>>> 2015-11-23 21:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi Tingwen,
>>>>>>>>>
>>>>>>>>> Would you minding sharing your changes in
>>>>>>>>> ExecutorAllocationManager#addExecutors().
>>>>>>>>>
>>>>>>>>> From my understanding and test, dynamic allocation can be worked
>>>>>>>>> when you set the min to max number of executors to the same number.
>>>>>>>>>
>>>>>>>>> Please check your Spark and Yarn log to make sure the executors
>>>>>>>>> are correctly started, the warning log means currently resource is not
>>>>>>>>> enough to submit tasks.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Saisai
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Nov 23, 2015 at 8:41 PM, 谢廷稳 <xieting...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>> I ran a SparkPi on YARN with Dynamic Allocation enabled and set 
>>>>>>>>>> spark.dynamicAllocation.maxExecutors
>>>>>>>>>> equals
>>>>>>>>>> spark.dynamicAllocation.minExecutors,then I submit an application
>>>>>>>>>> using:
>>>>>>>>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi
>>>>>>>>>> --master yarn-cluster --driver-memory 4g --executor-memory 8g
>>>>>>>>>> lib/spark-examples*.jar 200
>>>>>>>>>>
>>>>>>>>>> then, this application was submitted successfully, but the
>>>>>>>>>> AppMaster always saying “15/11/23 20:13:08 WARN
>>>>>>>>>> cluster.YarnClusterScheduler: Initial job has not accepted any 
>>>>>>>>>> resources;
>>>>>>>>>> check your cluster UI to ensure that workers are registered and have
>>>>>>>>>> sufficient resources”
>>>>>>>>>> and when I open DEBUG,I found “15/11/23 20:24:00 DEBUG
>>>>>>>>>> ExecutorAllocationManager: Not adding executors because our current 
>>>>>>>>>> target
>>>>>>>>>> total is already 50 (limit 50)” in the console.
>>>>>>>>>>
>>>>>>>>>> I have fixed it by modifying code in
>>>>>>>>>> ExecutorAllocationManager.addExecutors,Does this a bug or it was 
>>>>>>>>>> designed
>>>>>>>>>> that we can’t set maxExecutors equals minExecutors?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Weber
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Re: A Problem About Running Spark 1.5 on YARN with Dynamic Alloction

Reply via email to