Ok, now I see that the container name is formed in a wrong way. 2015-11-27 06:18:02,991 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(417)) - *container_e08_1448630699339_0003_01_000003* Container Transitioned from NEW to ALLOCATED 2015-11-27 06:18:02,991 INFO resourcemanager.RMAuditLogger (RMAuditLogger.java:logSuccess(106)) - USER=gpadmin OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1448630699339_0003 CONTAINERID=container_e08_1448630699339_0003_01_000003 2015-11-27 06:18:02,991 INFO scheduler.SchedulerNode (SchedulerNode.java:allocateContainer(154)) - Assigned container container_e08_1448630699339_0003_01_000003 of capacity <memory:512, vCores:1> on host hawq20.pivotal.io:45454, which has 3 containers, <memory:1536, vCores:3> used and <memory:4309, vCores:13> available after allocation 2015-11-27 06:18:02,991 INFO capacity.LeafQueue (LeafQueue.java:assignContainer(1616)) - assignedContainer application attempt=appattempt_1448630699339_0003_000001 container=Container: [ContainerId: container_e08_1448630699339_0003_01_000003, NodeId: hawq20.pivotal.io:45454, NodeHttpAddress: hawq20.pivotal.io:8042, Resource: <memory:512, vCores:1>, Priority: 1, Token: null, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:1024, vCores:2>, usedCapacity=0.17519248, absoluteUsedCapacity=0.17519248, numApps=1, numContainers=2 clusterResource=<memory:5845, vCores:16> 2015-11-27 06:18:03,850 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(417)) - container_e08_1448630699339_0003_01_000003 Container Transitioned from ALLOCATED to ACQUIRED 2015-11-27 06:18:03,877 WARN resourcemanager.RMAuditLogger (RMAuditLogger.java:logFailure(215)) - USER=gpadmin IP=192.168.220.128 OPERATION=AM Released Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to release container not owned by app or with invalid id. PERMISSIONS=Unauthorized access or invalid container APPID=application_1448630699339_0003 CONTAINERID=*container_1448630699339_0003_01_000003*
Container is allocated with the name container_e08_1448630699339_0003_01_000003, but when HAWQ tries to shut it down it uses the name container_1448630699339_0003_01_000003, which is wrong because there is no container with this name On Fri, Nov 27, 2015 at 2:29 PM, Alexey Grishchenko <programme...@gmail.com> wrote: > You can also find HAWQ log in attachment. As expected, HAWQ tries to > releaseResources, but somehow it returns success > > On Fri, Nov 27, 2015 at 2:10 PM, Alexey Grishchenko < > programme...@gmail.com> wrote: > >> Hi, guys >> >> I've got an issue with running HAWQ 2.0 on YARN >> On starting HAWQ successfully registers YARN application and starts >> allocating containers. But it never stops allocating them. Regardless the >> amount of vcores and memory you give YARN to manage, HAWQ would allocate >> containers until it eat all the available resources. After this, all the >> queries start to hang. >> >> What I can see in the RM logs (full log is attached): >> 2015-11-27 05:34:59,214 WARN resourcemanager.RMAuditLogger >> (RMAuditLogger.java:logFailure(215)) - USER=gpadmin OPERATION=AM Released >> Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to >> release container not owned by app or with invalid id. >> PERMISSIONS=Unauthorized access or invalid container >> APPID=application_1448630699339_0002 >> CONTAINERID=container_1448630699339_0002_01_000008 >> >> Do you know the possible reason for this? >> Using HAWQ 2.0.0.0_beta build 18453 on a single node with PHD 3.3.2.0 >> (YARN 2.7.1) >> >> -- >> Alexey Grishchenko, http://0x0fff.com >> > > > > -- > Alexey Grishchenko, http://0x0fff.com > -- Alexey Grishchenko, http://0x0fff.com