[ 
https://issues.apache.org/jira/browse/MYRIAD-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961560#comment-14961560
 ] 

Sarjeet Singh commented on MYRIAD-153:
--------------------------------------

More details...Here is the NM, RM & Mesos-master's log for container 
"container_1442507909665_0002_01_000012":

[node-1]# grep container_1442507909665_0002_01_000012 
task-nm.zero.28b0d3b6-79eb-44c7-be99-aa7157568d8e.stderr
15/09/17 10:04:40 INFO containermanager.ContainerManagerImpl: Start request for 
container_1442507909665_0002_01_000012 by user mapr
15/09/17 10:04:40 INFO application.ApplicationImpl: Adding 
container_1442507909665_0002_01_000012 to application 
application_1442507909665_0002
15/09/17 10:04:40 INFO nodemanager.NMAuditLogger: USER=mapr     
IP=10.10.101.116        OPERATION=Start Container Request       
TARGET=ContainerManageImpl      RESULT=SUCCESS  
APPID=application_1442507909665_0002    
CONTAINERID=container_1442507909665_0002_01_000012
15/09/17 10:04:40 INFO container.ContainerImpl: Container 
container_1442507909665_0002_01_000012 transitioned from NEW to LOCALIZING
15/09/17 10:04:40 INFO container.ContainerImpl: Container 
container_1442507909665_0002_01_000012 transitioned from LOCALIZING to LOCALIZED
15/09/17 10:04:40 INFO container.ContainerImpl: Container 
container_1442507909665_0002_01_000012 transitioned from LOCALIZED to RUNNING
15/09/17 10:04:40 INFO monitor.ContainersMonitorImpl: Starting 
resource-monitoring for container_1442507909665_0002_01_000012
15/09/17 10:04:40 INFO monitor.ContainersMonitorImpl: Memory usage of 
ProcessTree 26207 for container-id container_1442507909665_0002_01_000012: 62.0 
MB of 1 GB physical memory used; 1.7 GB of 2.1 GB virtual memory used
15/09/17 10:04:43 INFO monitor.ContainersMonitorImpl: Memory usage of 
ProcessTree 26207 for container-id container_1442507909665_0002_01_000012: 
271.5 MB of 1 GB physical memory used; 1.8 GB of 2.1 GB virtual memory used
15/09/17 10:04:46 INFO monitor.ContainersMonitorImpl: Memory usage of 
ProcessTree 26207 for container-id container_1442507909665_0002_01_000012: 
343.3 MB of 1 GB physical memory used; 1.8 GB of 2.1 GB virtual memory used
15/09/17 10:04:48 INFO containermanager.ContainerManagerImpl: Stopping 
container with container Id: container_1442507909665_0002_01_000012
15/09/17 10:04:48 INFO nodemanager.NMAuditLogger: USER=mapr     
IP=10.10.101.116        OPERATION=Stop Container Request        
TARGET=ContainerManageImpl      RESULT=SUCCESS  
APPID=application_1442507909665_0002    
CONTAINERID=container_1442507909665_0002_01_000012
15/09/17 10:04:48 INFO container.ContainerImpl: Container 
container_1442507909665_0002_01_000012 transitioned from RUNNING to KILLING
15/09/17 10:04:48 INFO launcher.ContainerLaunch: Cleaning up container 
container_1442507909665_0002_01_000012
15/09/17 10:04:48 WARN nodemanager.LinuxContainerExecutor: Exit code from 
container container_1442507909665_0002_01_000012 is : 143
15/09/17 10:04:48 INFO container.ContainerImpl: Container 
container_1442507909665_0002_01_000012 transitioned from KILLING to 
CONTAINER_CLEANEDUP_AFTER_KILL
15/09/17 10:04:48 INFO nodemanager.LinuxContainerExecutor: Deleting absolute 
path : 
/tmp/hadoop-mapr/nm-local-dir/usercache/mapr/appcache/application_1442507909665_0002/container_1442507909665_0002_01_000012
15/09/17 10:04:48 INFO nodemanager.NMAuditLogger: USER=mapr     
OPERATION=Container Finished - Killed   TARGET=ContainerImpl    RESULT=SUCCESS  
APPID=application_1442507909665_0002    
CONTAINERID=container_1442507909665_0002_01_000012
15/09/17 10:04:48 INFO container.ContainerImpl: Container 
container_1442507909665_0002_01_000012 transitioned from 
CONTAINER_CLEANEDUP_AFTER_KILL to DONE
15/09/17 10:04:48 INFO application.ApplicationImpl: Removing 
container_1442507909665_0002_01_000012 from application 
application_1442507909665_0002
15/09/17 10:04:49 INFO monitor.ContainersMonitorImpl: Stopping 
resource-monitoring for container_1442507909665_0002_01_000012
15/09/17 10:05:31 INFO nodemanager.NodeStatusUpdaterImpl: Removed completed 
containers from NM context: [container_1442507909665_0002_01_000009, 
container_1442507909665_0002_01_000012]


[node-1]# grep container_1442507909665_0002_01_000012 
testrm.646ddf2c-5d5a-11e5-9651-0cc47a587d16.stderr
15/09/17 10:04:11 INFO rmcontainer.RMContainerImpl: 
container_1442507909665_0002_01_000012 Container Transitioned from NEW to 
RESERVED
15/09/17 10:04:11 INFO fair.FSSchedulerNode: Reserved container 
container_1442507909665_0002_01_000012 on node host: qa101-117.qa.lab:31004 
#containers=3 available=<memory:-2048, vCores:-2, disks:2.5> used=<memory:3072, 
vCores:3, disks:1.5> for application 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@18a02c01
15/09/17 10:04:17 INFO rmcontainer.RMContainerImpl: 
container_1442507909665_0002_01_000012 Container Transitioned from NEW to 
ALLOCATED
15/09/17 10:04:17 INFO resourcemanager.RMAuditLogger: USER=mapr OPERATION=AM 
Allocated Container        TARGET=SchedulerApp     RESULT=SUCCESS  
APPID=application_1442507909665_0002    
CONTAINERID=container_1442507909665_0002_01_000012
15/09/17 10:04:17 INFO scheduler.SchedulerNode: Assigned container 
container_1442507909665_0002_01_000012 of capacity <memory:1024, vCores:1, 
disks:0.5> on host qa101-117.qa.lab:31004, which has 4 containers, 
<memory:4096, vCores:4, disks:2.0> used and <memory:86765, vCores:21, 
disks:2.0> available after allocation
15/09/17 10:04:17 WARN handlers.StatusUpdateEventHandler: Task: value: 
"yarn_container_1442507909665_0002_01_000012"
15/09/17 10:04:30 INFO rmcontainer.RMContainerImpl: 
container_1442507909665_0002_01_000012 Container Transitioned from ALLOCATED to 
ACQUIRED
15/09/17 10:04:40 WARN handlers.StatusUpdateEventHandler: Task: value: 
"yarn_container_1442507909665_0002_01_000012"
15/09/17 10:04:48 WARN handlers.StatusUpdateEventHandler: Task: value: 
"yarn_container_1442507909665_0002_01_000012"
15/09/17 10:04:50 INFO rmcontainer.RMContainerImpl: 
container_1442507909665_0002_01_000012 Container Transitioned from ACQUIRED to 
RUNNING
15/09/17 10:04:50 WARN handlers.StatusUpdateEventHandler: Task: value: 
"yarn_container_1442507909665_0002_01_000012"
15/09/17 10:04:51 INFO rmcontainer.RMContainerImpl: 
container_1442507909665_0002_01_000012 Container Transitioned from RUNNING to 
COMPLETED
15/09/17 10:04:51 INFO fair.FSAppAttempt: Completed container: 
container_1442507909665_0002_01_000012 in state: COMPLETED event:FINISHED
15/09/17 10:04:51 INFO resourcemanager.RMAuditLogger: USER=mapr OPERATION=AM 
Released Container TARGET=SchedulerApp     RESULT=SUCCESS  
APPID=application_1442507909665_0002    
CONTAINERID=container_1442507909665_0002_01_000012
15/09/17 10:04:51 INFO scheduler.SchedulerNode: Released container 
container_1442507909665_0002_01_000012 of capacity <memory:1024, vCores:1, 
disks:0.5> on host qa101-117.qa.lab:31004, which currently has 2 containers, 
<memory:2048, vCores:2, disks:1.0> used and <memory:-2048, vCores:-2, 
disks:3.0> available, release resources=true
15/09/17 10:04:51 INFO fair.FairScheduler: Application attempt 
appattempt_1442507909665_0002_000001 released container 
container_1442507909665_0002_01_000012 on node: host: qa101-117.qa.lab:31004 
#containers=2 available=<memory:-2048, vCores:-2, disks:3.0> used=<memory:2048, 
vCores:2, disks:1.0> with event: FINISHED


[node-1]# [root@qa101-116 bug20530]# grep 
container_1442507909665_0002_01_000012 mesos-master.INFO 
I0917 10:04:17.006140  5563 master.hpp:159] Adding task 
yarn_container_1442507909665_0002_01_000012 with resources cpus(*):1; 
mem(*):1024 on slave 20150916-104543-1969555978-5050-5493-S0 (qa101-117.qa.lab)
I0917 10:04:17.006294  5563 master.cpp:2835] Launching task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 (MyriadAlpha) at 
[email protected]:38037 with 
resources cpus(*):1; mem(*):1024 on slave 
20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 
(qa101-117.qa.lab)
I0917 10:04:17.013324  5563 master.cpp:3758] Status update TASK_RUNNING (UUID: 
11034a25-de90-4950-9d39-0a775280dd01) for task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 from slave 
20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 
(qa101-117.qa.lab)
I0917 10:04:17.013465  5563 master.cpp:3797] Forwarding status update 
TASK_RUNNING (UUID: 11034a25-de90-4950-9d39-0a775280dd01) for task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001
I0917 10:04:17.013741  5563 master.cpp:5178] Updating the latest state of task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 to TASK_RUNNING
I0917 10:04:17.014492  5562 master.cpp:3158] Processing ACKNOWLEDGE call 
11034a25-de90-4950-9d39-0a775280dd01 for task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 (MyriadAlpha) at 
[email protected]:38037 on slave 
20150916-104543-1969555978-5050-5493-S0
I0917 10:04:40.035140  5538 master.cpp:3758] Status update TASK_RUNNING (UUID: 
77348e87-d3cd-4416-b15a-72fe79a8a3f3) for task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 from slave 
20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 
(qa101-117.qa.lab)
I0917 10:04:40.035399  5538 master.cpp:3797] Forwarding status update 
TASK_RUNNING (UUID: 77348e87-d3cd-4416-b15a-72fe79a8a3f3) for task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001
I0917 10:04:40.035606  5538 master.cpp:5178] Updating the latest state of task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 to TASK_RUNNING
I0917 10:04:40.036656  5538 master.cpp:3158] Processing ACKNOWLEDGE call 
77348e87-d3cd-4416-b15a-72fe79a8a3f3 for task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 (MyriadAlpha) at 
[email protected]:38037 on slave 
20150916-104543-1969555978-5050-5493-S0
I0917 10:04:48.146806  5537 master.cpp:3758] Status update TASK_FINISHED (UUID: 
f2e39e4b-6678-479b-861c-fd58b88e8e30) for task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 from slave 
20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 
(qa101-117.qa.lab)
I0917 10:04:48.147330  5537 master.cpp:3797] Forwarding status update 
TASK_FINISHED (UUID: f2e39e4b-6678-479b-861c-fd58b88e8e30) for task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001
I0917 10:04:48.147701  5537 master.cpp:5178] Updating the latest state of task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 to TASK_FINISHED
I0917 10:04:48.149219  5537 master.cpp:5246] Removing task 
yarn_container_1442507909665_0002_01_000012 with resources cpus(*):1; 
mem(*):1024 of framework 20150916-104543-1969555978-5050-5493-0001 on slave 
20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 
(qa101-117.qa.lab)
I0917 10:04:48.149515  5537 master.cpp:3158] Processing ACKNOWLEDGE call 
f2e39e4b-6678-479b-861c-fd58b88e8e30 for task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 (MyriadAlpha) at 
[email protected]:38037 on slave 
20150916-104543-1969555978-5050-5493-S0
I0917 10:04:50.011129  5559 master.hpp:159] Adding task 
yarn_container_1442507909665_0002_01_000012 with resources cpus(*):1; 
mem(*):1024 on slave 20150916-104543-1969555978-5050-5493-S0 (qa101-117.qa.lab)
I0917 10:04:50.011324  5559 master.cpp:2835] Launching task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 (MyriadAlpha) at 
[email protected]:38037 with 
resources cpus(*):1; mem(*):1024 on slave 
20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 
(qa101-117.qa.lab)
I0917 10:04:50.019701  5554 master.cpp:3758] Status update TASK_RUNNING (UUID: 
cd834102-6d12-46e9-be42-b76b3341ad36) for task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 from slave 
20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 
(qa101-117.qa.lab)
I0917 10:04:50.019907  5554 master.cpp:3797] Forwarding status update 
TASK_RUNNING (UUID: cd834102-6d12-46e9-be42-b76b3341ad36) for task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001
I0917 10:04:50.020102  5554 master.cpp:5178] Updating the latest state of task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 to TASK_RUNNING
I0917 10:04:50.020922  5554 master.cpp:3158] Processing ACKNOWLEDGE call 
cd834102-6d12-46e9-be42-b76b3341ad36 for task 
yarn_container_1442507909665_0002_01_000012 of framework 
20150916-104543-1969555978-5050-5493-0001 (MyriadAlpha) at 
[email protected]:38037 on slave 
20150916-104543-1969555978-5050-5493-S0

> Placeholder tasks yarn_container_* is not cleaned after yarn job is complete.
> -----------------------------------------------------------------------------
>
>                 Key: MYRIAD-153
>                 URL: https://issues.apache.org/jira/browse/MYRIAD-153
>             Project: Myriad
>          Issue Type: Bug
>            Reporter: Sarjeet Singh
>         Attachments: Mesos_UI_screeshot_placeholder_tasks_running.png
>
>
> Observed the placeholder tasks for containers launched on FGS are still in 
> RUNNING state on mesos. These container tasks are not cleaned up properly 
> after job is finished completely.
> see screenshot attached for mesos UI with placeholder tasks still running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to