Ok, now I see that the container name is formed in a wrong way.

2015-11-27 06:18:02,991 INFO  rmcontainer.RMContainerImpl
(RMContainerImpl.java:handle(417)) -
*container_e08_1448630699339_0003_01_000003* Container Transitioned from
NEW to ALLOCATED
2015-11-27 06:18:02,991 INFO  resourcemanager.RMAuditLogger
(RMAuditLogger.java:logSuccess(106)) - USER=gpadmin OPERATION=AM Allocated
Container TARGET=SchedulerApp RESULT=SUCCESS
APPID=application_1448630699339_0003
CONTAINERID=container_e08_1448630699339_0003_01_000003
2015-11-27 06:18:02,991 INFO  scheduler.SchedulerNode
(SchedulerNode.java:allocateContainer(154)) - Assigned container
container_e08_1448630699339_0003_01_000003 of capacity <memory:512,
vCores:1> on host hawq20.pivotal.io:45454, which has 3 containers,
<memory:1536, vCores:3> used and <memory:4309, vCores:13> available after
allocation
2015-11-27 06:18:02,991 INFO  capacity.LeafQueue
(LeafQueue.java:assignContainer(1616)) - assignedContainer application
attempt=appattempt_1448630699339_0003_000001 container=Container:
[ContainerId: container_e08_1448630699339_0003_01_000003, NodeId:
hawq20.pivotal.io:45454, NodeHttpAddress: hawq20.pivotal.io:8042, Resource:
<memory:512, vCores:1>, Priority: 1, Token: null, ] queue=default:
capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:1024, vCores:2>,
usedCapacity=0.17519248, absoluteUsedCapacity=0.17519248, numApps=1,
numContainers=2 clusterResource=<memory:5845, vCores:16>
2015-11-27 06:18:03,850 INFO  rmcontainer.RMContainerImpl
(RMContainerImpl.java:handle(417)) -
container_e08_1448630699339_0003_01_000003 Container Transitioned from
ALLOCATED to ACQUIRED
2015-11-27 06:18:03,877 WARN  resourcemanager.RMAuditLogger
(RMAuditLogger.java:logFailure(215)) - USER=gpadmin IP=192.168.220.128
OPERATION=AM
Released Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to
release container not owned by app or with invalid id. PERMISSIONS=Unauthorized
access or invalid container APPID=application_1448630699339_0003
CONTAINERID=*container_1448630699339_0003_01_000003*

Container is allocated with the name
container_e08_1448630699339_0003_01_000003, but when HAWQ tries to shut it
down it uses the name container_1448630699339_0003_01_000003, which is
wrong because there is no container with this name


On Fri, Nov 27, 2015 at 2:29 PM, Alexey Grishchenko <programme...@gmail.com>
wrote:

> You can also find HAWQ log in attachment. As expected, HAWQ tries to
> releaseResources, but somehow it returns success
>
> On Fri, Nov 27, 2015 at 2:10 PM, Alexey Grishchenko <
> programme...@gmail.com> wrote:
>
>> Hi, guys
>>
>> I've got an issue with running HAWQ 2.0 on YARN
>> On starting HAWQ successfully registers YARN application and starts
>> allocating containers. But it never stops allocating them. Regardless the
>> amount of vcores and memory you give YARN to manage, HAWQ would allocate
>> containers until it eat all the available resources. After this, all the
>> queries start to hang.
>>
>> What I can see in the RM logs (full log is attached):
>> 2015-11-27 05:34:59,214 WARN  resourcemanager.RMAuditLogger
>> (RMAuditLogger.java:logFailure(215)) - USER=gpadmin OPERATION=AM Released
>> Container TARGET=Scheduler        RESULT=FAILURE  DESCRIPTION=Trying to
>> release container not owned by app or with invalid id.
>>  PERMISSIONS=Unauthorized access or invalid container
>>  APPID=application_1448630699339_0002
>>  CONTAINERID=container_1448630699339_0002_01_000008
>>
>> Do you know the possible reason for this?
>> Using HAWQ 2.0.0.0_beta build 18453 on a single node with PHD 3.3.2.0
>> (YARN 2.7.1)
>>
>> --
>> Alexey Grishchenko, http://0x0fff.com
>>
>
>
>
> --
> Alexey Grishchenko, http://0x0fff.com
>



-- 
Alexey Grishchenko, http://0x0fff.com

Reply via email to