Re: problem getting fine grained scaling workig

Stephen Gran Sun, 05 Jun 2016 11:58:18 -0700

Hi,

Brilliant!  Working now.


Thank you very much,

On 05/06/16 18:09, Darin Johnson wrote:
> Stephen,
>
> I was able to recreate the problem (specific due to 2.7.2, they changed the
> defaults on the following two properties to true).  Setting them to false
> allowed me to again run map reduce jobs.  I'll try to update the
> documentation later today.
>
>    <property>
>
>      <name>yarn.nodemanager.pmem-check-enabled</name>
>
>      <value>false</value>
>
>    </property>
>
>    <property>
>
>      <name>yarn.nodemanager.vmem-check-enabled</name>
>
>      <value>false</value>
>
>    </property>
>
> Darin
>
> On Sun, Jun 5, 2016 at 10:30 AM, Stephen Gran <stephen.g...@piksel.com>
> wrote:
>
>> Hi,
>>
>> I think those are the properties I added when I started getting this
>> error.  Removing them doesn't seem to make any difference, sadly.
>>
>> This is hadoop 2.7.2
>>
>> Cheers,
>>
>> On 05/06/16 14:45, Darin Johnson wrote:
>>> Hey Stephen,
>>>
>>> I think you're pretty close.
>>>
>>> Looking at the config I'd suggest removing these properties:
>>>
>>>      <property>
>>>           <name>yarn.nodemanager.resource.memory-mb</name>
>>>           <value>4096</value>
>>>       </property>
>>>       <property>
>>>           <name>yarn.scheduler.maximum-allocation-vcores</name>
>>>           <value>12</value>
>>>       </property>
>>>       <property>
>>>           <name>yarn.scheduler.maximum-allocation-mb</name>
>>>           <value>8192</value>
>>>       </property>
>>>     <property>
>>>      <name>yarn.nodemanager.vmem-check-enabled</name>
>>>       <value>false</value>
>>>       <description>Whether virtual memory limits will be enforced for
>>> containers</description>
>>>     </property>
>>> <property>
>>>      <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>       <value>4</value>
>>>       <description>Ratio between virtual memory to physical memory when
>>> setting memory limits for containers</description>
>>>     </property>
>>>
>>> I'll try them out on my test cluster later today/tonight and see if I can
>>> recreate the problem.  What version of hadoop are you running?  I'll make
>>> sure I'm consistent with that as well.
>>>
>>> Thanks,
>>>
>>> Darin
>>> On Jun 5, 2016 8:15 AM, "Stephen Gran" <stephen.g...@piksel.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Attached.  Thanks very much for looking.
>>>>
>>>> Cheers,
>>>>
>>>> On 05/06/16 12:51, Darin Johnson wrote:
>>>>> Hey Steven can you please send your yarn-site.xml, I'm guessing you're
>> on
>>>>> the right track.
>>>>>
>>>>> Darin
>>>>> Hi,
>>>>>
>>>>> OK.  That helps, thank you.  I think I just misunderstood the docs (or
>>>>> they never said explicitly that you did need at least some static
>>>>> resource), and I scaled down the initial nm.medium that got started.  I
>>>>> get a bit further now, and jobs start but are killed with:
>>>>>
>>>>> Diagnostics: Container
>>>>> [pid=3865,containerID=container_1465112239753_0001_03_000001] is
>> running
>>>>> beyond virtual memory limits. Current usage: 50.7 MB of 0B physical
>>>>> memory used; 2.6 GB of 0B virtual memory used. Killing container
>>>>>
>>>>> When I've seen this in the past with yarn but without myriad, it was
>>>>> usually about ratios of vmem to mem and things like that - I've tried
>>>>> some of those knobs, but I didn't expect much result and didn't get
>> any.
>>>>>
>>>>> What strikes me about the error message is that the vmem and mem
>>>>> allocations are for 0.
>>>>>
>>>>> I'm sorry for asking what are probably naive questions here, I couldn't
>>>>> find a different forum.  If there is one, please point me there so I
>>>>> don't disrupt the dev flow here.
>>>>>
>>>>> I can see this in the logs:
>>>>>
>>>>>
>>>>> 2016-06-05 07:39:25,687 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
>>>>> container_1465112239753_0001_03_000001 Container Transitioned from NEW
>>>>> to ALLOCATED
>>>>> 2016-06-05 07:39:25,688 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root
>>>>>        OPERATION=AM Allocated Container        TARGET=SchedulerApp
>>>>> RESULT=SUCCESS  APPID=application_1465112239753_0001
>>>>> CONTAINERID=container_1465112239753_0001_03_000001
>>>>> 2016-06-05 07:39:25,688 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
>>>>> Assigned container container_1465112239753_0001_03_000001 of capacity
>>>>> <memory:0, vCores:0> on host slave2.testing.local:26688, which has 1
>>>>> containers, <memory:0, vCores:0> used and <memory:4096, vCores:1>
>>>>> available after allocation
>>>>> 2016-06-05 07:39:25,689 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
>>>>> Sending NMToken for nodeId : slave2.testing.local:26688 for container :
>>>>> container_1465112239753_0001_03_000001
>>>>> 2016-06-05 07:39:25,696 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
>>>>> container_1465112239753_0001_03_000001 Container Transitioned from
>>>>> ALLOCATED to ACQUIRED
>>>>> 2016-06-05 07:39:25,696 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
>>>>> Clear node set for appattempt_1465112239753_0001_000003
>>>>> 2016-06-05 07:39:25,696 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>>>>> Storing attempt: AppId: application_1465112239753_0001 AttemptId:
>>>>> appattempt_1465112239753_0001_000003 MasterContainer: Container:
>>>>> [ContainerId: container_1465112239753_0001_03_000001, NodeId:
>>>>> slave2.testing.local:26688, NodeHttpAddress:
>> slave2.testing.local:24387,
>>>>> Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind:
>>>>> ContainerToken, service: 10.0.5.5:26688 }, ]
>>>>> 2016-06-05 07:39:25,697 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>>>>> appattempt_1465112239753_0001_000003 State change from SCHEDULED to
>>>>> ALLOCATED_SAVING
>>>>> 2016-06-05 07:39:25,698 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>>>>> appattempt_1465112239753_0001_000003 State change from ALLOCATED_SAVING
>>>>> to ALLOCATED
>>>>> 2016-06-05 07:39:25,699 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
>>>>> Launching masterappattempt_1465112239753_0001_000003
>>>>> 2016-06-05 07:39:25,705 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
>>>>> Setting up container Container: [ContainerId:
>>>>> container_1465112239753_0001_03_000001, NodeId:
>>>>> slave2.testing.local:26688, NodeHttpAddress:
>> slave2.testing.local:24387,
>>>>> Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind:
>>>>> ContainerToken, service: 10.0.5.5:26688 }, ] for AM
>>>>> appattempt_1465112239753_0001_000003
>>>>> 2016-06-05 07:39:25,705 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
>>>>> Command to launch container container_1465112239753_0001_03_000001 :
>>>>> $JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp
>>>>> -Dlog4j.configuration=container-log4j.properties
>>>>> -Dyarn.app.container.log.dir=<LOG_DIR>
>>>>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
>>>>> -Dhadoop.root.logfile=syslog  -Xmx1024m
>>>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_DIR>/stdout
>>>>> 2><LOG_DIR>/stderr
>>>>> 2016-06-05 07:39:25,706 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
>>>>> Create AMRMToken for ApplicationAttempt:
>>>>> appattempt_1465112239753_0001_000003
>>>>> 2016-06-05 07:39:25,707 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
>>>>> Creating password for appattempt_1465112239753_0001_000003
>>>>> 2016-06-05 07:39:25,727 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher:
>>>>> Done launching container Container: [ContainerId:
>>>>> container_1465112239753_0001_03_000001, NodeId:
>>>>> slave2.testing.local:26688, NodeHttpAddress:
>> slave2.testing.local:24387,
>>>>> Resource: <memory:0, vCores:0>, Priority: 0, Token: Token { kind:
>>>>> ContainerToken, service: 10.0.5.5:26688 }, ] for AM
>>>>> appattempt_1465112239753_0001_000003
>>>>> 2016-06-05 07:39:25,728 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>>>>> appattempt_1465112239753_0001_000003 State change from ALLOCATED to
>>>> LAUNCHED
>>>>> 2016-06-05 07:39:25,736 WARN
>>>>> org.apache.myriad.scheduler.event.handlers.StatusUpdateEventHandler:
>>>>> Task: yarn_container_1465112239753_0001_03_000001 not found, status:
>>>>> TASK_RUNNING
>>>>> 2016-06-05 07:39:26,510 INFO org.apache.hadoop.yarn.util.RackResolver:
>>>>> Resolved slave1.testing.local to /default-rack
>>>>> 2016-06-05 07:39:26,517 WARN
>>>>> org.apache.myriad.scheduler.fgs.NMHeartBeatHandler: FineGrainedScaling
>>>>> feature got invoked for a NM with non-zero capacity. Host:
>>>>> slave1.testing.local, Mem: 4096, CPU: 0. Setting the NM's capacity to
>>>>> (0G,0CPU)
>>>>> 2016-06-05 07:39:26,517 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
>>>>> slave1.testing.local:29121 Node Transitioned from NEW to RUNNING
>>>>> 2016-06-05 07:39:26,518 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
>>>>> Added node slave1.testing.local:29121 cluster capacity: <memory:4096,
>>>>> vCores:1>
>>>>> 2016-06-05 07:39:26,519 INFO
>>>>> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager:
>>>>> afterSchedulerEventHandled: NM registration from node
>>>> slave1.testing.local
>>>>> 2016-06-05 07:39:26,528 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
>>>>> received container statuses on node manager register :[container_id {
>>>>> app_attempt_id { application_id { id: 1 cluster_timestamp:
>> 1465112239753
>>>>> } attemptId: 2 } id: 1 } container_state: C_RUNNING resource { memory:
>> 0
>>>>> virtual_cores: 0 } priority { priority: 0 } diagnostics: ""
>>>>> container_exit_status: -1000 creation_time: 1465112356478]
>>>>> 2016-06-05 07:39:26,530 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
>>>>> NodeManager from node slave1.testing.local(cmPort: 29121 httpPort:
>>>>> 20456) registered with capability: <memory:0, vCores:0>, assigned
>> nodeId
>>>>> slave1.testing.local:29121
>>>>> 2016-06-05 07:39:26,611 INFO
>>>>> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: Setting
>>>>> capacity for node slave1.testing.local to <memory:4637, vCores:6>
>>>>> 2016-06-05 07:39:26,611 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>>>>> Update resource on node: slave1.testing.local from: <memory:0,
>>>>> vCores:0>, to: <memory:4637, vCores:6>
>>>>> 2016-06-05 07:39:26,615 INFO
>>>>> org.apache.myriad.scheduler.fgs.YarnNodeCapacityManager: Setting
>>>>> capacity for node slave1.testing.local to <memory:0, vCores:0>
>>>>> 2016-06-05 07:39:26,616 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>>>>> Update resource on node: slave1.testing.local from: <memory:4637,
>>>>> vCores:6>, to: <memory:0, vCores:0>
>>>>> 2016-06-05 07:39:26,691 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
>>>>> container_1465112239753_0001_03_000001 Container Transitioned from
>>>>> ACQUIRED to RUNNING
>>>>> 2016-06-05 07:39:26,835 WARN
>>>>> org.apache.myriad.scheduler.event.handlers.StatusUpdateEventHandler:
>>>>> Task: yarn_container_1465112239753_0001_03_000001 not found, status:
>>>>> TASK_FINISHED
>>>>> 2016-06-05 07:39:27,603 INFO
>>>>> org.apache.myriad.scheduler.event.handlers.ResourceOffersEventHandler:
>>>>> Received offers 1
>>>>> 2016-06-05 07:39:27,748 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
>>>>> container_1465112239753_0001_03_000001 Container Transitioned from
>>>>> RUNNING to COMPLETED
>>>>> 2016-06-05 07:39:27,748 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt:
>>>>> Completed container: container_1465112239753_0001_03_000001 in state:
>>>>> COMPLETED event:FINISHED
>>>>> 2016-06-05 07:39:27,748 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root
>>>>>        OPERATION=AM Released Container TARGET=SchedulerApp
>>>>> RESULT=SUCCESS  APPID=application_1465112239753_0001
>>>>> CONTAINERID=container_1465112239753_0001_03_000001
>>>>> 2016-06-05 07:39:27,748 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
>>>>> Released container container_1465112239753_0001_03_000001 of capacity
>>>>> <memory:0, vCores:0> on host slave2.testing.local:26688, which
>> currently
>>>>> has 0 containers, <memory:0, vCores:0> used and <memory:4096, vCores:1>
>>>>> available, release resources=true
>>>>> 2016-06-05 07:39:27,748 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
>>>>> Application attempt appattempt_1465112239753_0001_000003 released
>>>>> container container_1465112239753_0001_03_000001 on node: host:
>>>>> slave2.testing.local:26688 #containers=0 available=<memory:4096,
>>>>> vCores:1> used=<memory:0, vCores:0> with event: FINISHED
>>>>> 2016-06-05 07:39:27,749 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>>>>> Updating application attempt appattempt_1465112239753_0001_000003 with
>>>>> final state: FAILED, and exit status: -103
>>>>> 2016-06-05 07:39:27,750 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>>>>> appattempt_1465112239753_0001_000003 State change from LAUNCHED to
>>>>> FINAL_SAVING
>>>>> 2016-06-05 07:39:27,751 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
>>>>> Unregistering app attempt : appattempt_1465112239753_0001_000003
>>>>> 2016-06-05 07:39:27,751 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
>>>>> Application finished, removing password for
>>>>> appattempt_1465112239753_0001_000003
>>>>> 2016-06-05 07:39:27,751 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
>>>>> appattempt_1465112239753_0001_000003 State change from FINAL_SAVING to
>>>>> FAILED
>>>>> 2016-06-05 07:39:27,751 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The
>>>>> number of failed attempts is 2. The max attempts is 2
>>>>> 2016-06-05 07:39:27,753 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating
>>>>> application application_1465112239753_0001 with final state: FAILED
>>>>> 2016-06-05 07:39:27,756 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
>>>>> application_1465112239753_0001 State change from ACCEPTED to
>> FINAL_SAVING
>>>>> 2016-06-05 07:39:27,757 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
>>>>> Application appattempt_1465112239753_0001_000003 is done.
>>>> finalState=FAILED
>>>>> 2016-06-05 07:39:27,757 INFO
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo:
>>>>> Application application_1465112239753_0001 requests cleared
>>>>> 2016-06-05 07:39:27,758 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
>>>>> Updating info for app: application_1465112239753_0001
>>>>> 2016-06-05 07:39:27,758 INFO
>>>>> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
>>>>> Application application_1465112239753_0001 failed 2 times due to AM
>>>>> Container for appattempt_1465112239753_0001_000003 exited with
>>>>> exitCode: -103
>>>>> For more detailed output, check application tracking
>>>>> page:
>>>>>
>>>>
>> http://master.testing.local:8088/cluster/app/application_1465112239753_0001Then
>>>>> ,
>>>>> click on links to logs of each attempt.
>>>>> Diagnostics: Container
>>>>> [pid=3865,containerID=container_1465112239753_0001_03_000001] is
>> running
>>>>> beyond virtual memory limits. Current usage: 50.7 MB of 0B physical
>>>>> memory used; 2.6 GB of 0B virtual memory used. Killing container.
>>>>> Dump of the process-tree for container_1465112239753_0001_03_000001 :
>>>>>             |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
>>>>> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>>>>>             |- 3873 3865 3865 3865 (java) 80 26 2770927616 12614
>>>>> /usr/lib/jvm/java-8-openjdk-amd64/bin/java
>>>>>
>>>>
>> -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1465112239753_0001/container_1465112239753_0001_03_000001/tmp
>>>>> -Dlog4j.configuration=container-log4j.properties
>>>>>
>>>>
>> -Dyarn.app.container.log.dir=/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001
>>>>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
>>>>> -Dhadoop.root.logfile=syslog -Xmx1024m
>>>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
>>>>>             |- 3865 3863 3865 3865 (bash) 0 1 11427840 354 /bin/bash -c
>>>>> /usr/lib/jvm/java-8-openjdk-amd64/bin/java
>>>>>
>>>>
>> -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1465112239753_0001/container_1465112239753_0001_03_000001/tmp
>>>>> -Dlog4j.configuration=container-log4j.properties
>>>>>
>>>>
>> -Dyarn.app.container.log.dir=/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001
>>>>> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
>>>>> -Dhadoop.root.logfile=syslog  -Xmx1024m
>>>>> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
>>>>>
>>>>
>> 1>/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001/stdout
>>>>>
>>>>
>> 2>/srv/apps/hadoop-2.7.2/logs/userlogs/application_1465112239753_0001/container_1465112239753_0001_03_000001/stderr
>>>>>
>>>>>
>>>>> Container killed on request. Exit code is 143
>>>>> Container exited with a non-zero exit code 143
>>>>> Failing this attempt. Failing the application.
>>>>>
>>>>>
>>>>>
>>>>> On 03/06/16 15:52, yuliya Feldman wrote:
>>>>>> I believe you need at least one NM that is not subject to fine grain
>>>>> scaling.
>>>>>> So far if total resources on the cluster is less then a single
>> container
>>>>> needs for AM you won't be able to submit any app.As exception below
>> tells
>>>>> you.
>>>>>> (Invalid resource request, requested memory < 0, or requested memory
>>>>> max
>>>>> configured, requestedMemory=1536, maxMemory=0
>>>>>>             at)
>>>>>> I believe by default when starting Myriad cluster one NM with non 0
>>>>> capacity should start by default.
>>>>>> In addition see in RM log whether offers with resources are coming to
>> RM
>>>>> - this info should be in the log.
>>>>>>
>>>>>>           From: Stephen Gran <stephen.g...@piksel.com>
>>>>>>      To: "dev@myriad.incubator.apache.org" <
>>>> dev@myriad.incubator.apache.org>
>>>>>>      Sent: Friday, June 3, 2016 1:29 AM
>>>>>>      Subject: problem getting fine grained scaling workig
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to get fine grained scaling going on a test mesos
>> cluster.  I
>>>>>> have a single master and 2 agents.  I am running 2 node managers with
>>>>>> the zero profile, one per agent.  I can see both of them in the RM UI
>>>>>> reporting correctly as having 0 resources.
>>>>>>
>>>>>> I'm getting stack traces when I try to launch a sample application,
>>>>>> though.  I feel like I'm just missing something obvious somewhere -
>> can
>>>>>> anyone shed any light?
>>>>>>
>>>>>> This is on a build of yesterday's git head.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> root@master:/srv/apps/hadoop# bin/yarn jar
>>>>>> share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar teragen
>> 10000
>>>>>> /outDir
>>>>>> 16/06/03 08:23:33 INFO client.RMProxy: Connecting to ResourceManager
>> at
>>>>>> master.testing.local/10.0.5.3:8032
>>>>>> 16/06/03 08:23:34 INFO terasort.TeraSort: Generating 10000 using 2
>>>>>> 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: number of splits:2
>>>>>> 16/06/03 08:23:34 INFO mapreduce.JobSubmitter: Submitting tokens for
>>>>>> job: job_1464902078156_0001
>>>>>> 16/06/03 08:23:35 INFO mapreduce.JobSubmitter: Cleaning up the staging
>>>>>> area /tmp/hadoop-yarn/staging/root/.staging/job_1464902078156_0001
>>>>>> java.io.IOException:
>>>>>> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException:
>>>>>> Invalid resource request, requested memory < 0, or requested memory >
>>>>>> max configured, requestedMemory=1536, maxMemory=0
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>>>>>>             at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>>>>>             at
>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>>>>>>             at
>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>>>>>>             at java.security.AccessController.doPrivileged(Native
>> Method)
>>>>>>             at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>>>>>             at
>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>>>>>>
>>>>>>             at
>>>>> org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
>>>>>>             at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>>>>>>             at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>>>>>>             at java.security.AccessController.doPrivileged(Native
>> Method)
>>>>>>             at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>>>>>             at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>>>>>>             at
>>>>> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
>>>>>>             at
>>>>> org.apache.hadoop.examples.terasort.TeraGen.run(TeraGen.java:301)
>>>>>>             at
>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>>>>>             at
>>>>> org.apache.hadoop.examples.terasort.TeraGen.main(TeraGen.java:305)
>>>>>>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>             at java.lang.reflect.Method.invoke(Method.java:497)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>>>>>>             at
>>>>> org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>>>>>>             at
>>>>> org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
>>>>>>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>             at java.lang.reflect.Method.invoke(Method.java:497)
>>>>>>             at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>>>>>>             at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>>>>>> Caused by:
>>>>>> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException:
>>>>>> Invalid resource request, requested memory < 0, or requested memory >
>>>>>> max configured, requestedMemory=1536, maxMemory=0
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>>>>>>             at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>>>>>             at
>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>>>>>>             at
>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>>>>>>             at java.security.AccessController.doPrivileged(Native
>> Method)
>>>>>>             at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>>>>>             at
>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>>>>>>
>>>>>>             at
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>>> Method)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>>>             at
>>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>>>>>>             at
>>>>>
>> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:239)
>>>>>>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>             at java.lang.reflect.Method.invoke(Method.java:497)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>>>>             at com.sun.proxy.$Proxy13.submitApplication(Unknown Source)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:253)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:290)
>>>>>>             at
>>>>> org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
>>>>>>             ... 24 more
>>>>>> Caused by:
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>>>>>> Invalid resource request, requested memory < 0, or requested memory >
>>>>>> max configured, requestedMemory=1536, maxMemory=0
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:268)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:228)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:236)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:329)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:281)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:580)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>>>>>>             at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>>>>>             at
>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>>>>>>             at
>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>>>>>>             at java.security.AccessController.doPrivileged(Native
>> Method)
>>>>>>             at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>>>>>             at
>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>>>>>>
>>>>>>             at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>>>>>>             at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>>>>>>             at com.sun.proxy.$Proxy12.submitApplication(Unknown Source)
>>>>>>             at
>>>>>>
>>>>>
>>>>
>> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:236)
>>>>>>             ... 34 more
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> --
>>>>>> Stephen Gran
>>>>>> Senior Technical Architect
>>>>>>
>>>>>> picture the possibilities | piksel.com
>>>>>> This message is private and confidential. If you have received this
>>>>> message in error, please notify the sender or serviced...@piksel.com
>> and
>>>>> remove it from your system.
>>>>>>
>>>>>> Piksel Inc is a company registered in the United States New York City,
>>>>> 1250 Broadway, Suite 1902, New York, NY 10001. F No. = 2931986
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Stephen Gran
>>>>> Senior Technical Architect
>>>>>
>>>>> picture the possibilities | piksel.com
>>>>>
>>>>
>>>> --
>>>> Stephen Gran
>>>> Senior Technical Architect
>>>>
>>>> picture the possibilities | piksel.com
>>>>
>>>
>>
>> --
>> Stephen Gran
>> Senior Technical Architect
>>
>> picture the possibilities | piksel.com
>>
>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com

Re: problem getting fine grained scaling workig

Reply via email to