[
https://issues.apache.org/jira/browse/HDDS-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627219#comment-17627219
]
Sadanand Shenoy commented on HDDS-6126:
---------------------------------------
NM log
{code:java}
2022-11-01T14:45:38.462945577Z 2022-11-01 14:45:38 INFO
ContainerManagerImpl:960 - Stopping container with container Id:
container_1667306908598_0003_01_000005
2022-11-01T14:45:38.463297250Z 2022-11-01 14:45:38 INFO ContainerImpl:1135 -
Container container_1667306908598_0003_01_000005 transitioned from RUNNING to
KILLING
2022-11-01T14:45:38.463616665Z 2022-11-01 14:45:38 INFO ContainerLaunch:370 -
Cleaning up container container_1667306908598_0003_01_000005
2022-11-01T14:45:38.464046248Z 2022-11-01 14:45:38 INFO NMAuditLogger:89 -
USER=hadoop IP=172.20.0.4 OPERATION=Stop Container Request
TARGET=ContainerManageImpl RESULT=SUCCESS
APPID=application_1667306908598_0003
CONTAINERID=container_1667306908598_0003_01_000005
2022-11-01T14:45:38.476263911Z 2022-11-01 14:45:38 WARN
DefaultContainerExecutor:224 - Exit code from container
container_1667306908598_0003_01_000005 is : 143
2022-11-01T14:45:38.489539300Z 2022-11-01 14:45:38 INFO ContainerImpl:1135 -
Container container_1667306908598_0003_01_000005 transitioned from KILLING to
CONTAINER_CLEANEDUP_AFTER_KILL
2022-11-01T14:45:38.490021365Z 2022-11-01 14:45:38 INFO NMAuditLogger:89 -
USER=hadoop OPERATION=Container Finished - Killed TARGET=ContainerImpl
RESULT=SUCCESS APPID=application_1667306908598_0003
CONTAINERID=container_1667306908598_0003_01_000005
2022-11-01T14:45:38.490184714Z 2022-11-01 14:45:38 INFO ContainerImpl:1135 -
Container container_1667306908598_0003_01_000005 transitioned from
CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2022-11-01T14:45:38.490830742Z 2022-11-01 14:45:38 INFO ApplicationImpl:347 -
Removing container_1667306908598_0003_01_000005 from application
application_1667306908598_0003
2022-11-01T14:45:38.491627397Z 2022-11-01 14:45:38 INFO
ContainersMonitorImpl:196 - Neither virutal-memory nor physical-memory
monitoring is needed. Not running the monitor-thread
2022-11-01T14:45:38.492801085Z 2022-11-01 14:45:38 INFO AuxServices:196 - Got
event CONTAINER_STOP for appId application_1667306908598_0003
----IDLE for ~10 min-----
2022-11-01T14:58:23.116387795Z 2022-11-01 14:58:23 INFO Server:1316 - Auth
successful for appattempt_1667306908598_0003_000001 (auth:SIMPLE)
2022-11-01T14:58:23.125230150Z 2022-11-01 14:58:23 INFO
ContainerManagerImpl:960 - Stopping container with container Id:
container_1667306908598_0003_01_000001
2022-11-01T14:58:23.125819375Z 2022-11-01 14:58:23 INFO NMAuditLogger:89 -
USER=hadoop IP=172.20.0.7 OPERATION=Stop Container Request
TARGET=ContainerManageImpl RESULT=SUCCESS
APPID=application_1667306908598_0003
CONTAINERID=container_1667306908598_0003_01_000001
2022-11-01T14:58:23.126914642Z 2022-11-01 14:58:23 INFO ContainerImpl:1135 -
Container container_1667306908598_0003_01_000001 transitioned from RUNNING to
KILLING
2022-11-01T14:58:23.127085783Z 2022-11-01 14:58:23 INFO ContainerLaunch:370 -
Cleaning up container container_1667306908598_0003_01_000001
2022-11-01T14:58:23.138359689Z 2022-11-01 14:58:23 WARN
DefaultContainerExecutor:224 - Exit code from container
container_1667306908598_0003_01_000001 is : 143
2022-11-01T14:58:23.154846877Z 2022-11-01 14:58:23 INFO ContainerImpl:1135 -
Container container_1667306908598_0003_01_000001 transitioned from KILLING to
CONTAINER_CLEANEDUP_AFTER_KILL
2022-11-01T14:58:23.156116528Z 2022-11-01 14:58:23 INFO NMAuditLogger:89 -
USER=hadoop OPERATION=Container Finished - Killed TARGET=ContainerImpl
RESULT=SUCCESS APPID=application_1667306908598_0003
CONTAINERID=container_1667306908598_0003_01_000001
2022-11-01T14:58:23.156486799Z 2022-11-01 14:58:23 INFO ContainerImpl:1135 -
Container container_1667306908598_0003_01_000001 transitioned from
CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2022-11-01T14:58:23.156859401Z 2022-11-01 14:58:23 INFO ApplicationImpl:347 -
Removing container_1667306908598_0003_01_000001 from application
application_1667306908598_0003
2022-11-01T14:58:23.156881610Z 2022-11-01 14:58:23 INFO
ContainersMonitorImpl:196 - Neither virutal-memory nor physical-memory
monitoring is needed. Not running the monitor-thread
2022-11-01T14:58:23.157438464Z 2022-11-01 14:58:23 INFO AuxServices:196 - Got
event CONTAINER_STOP for appId application_1667306908598_0003
2022-11-01T14:58:23.159658999Z 2022-11-01 14:58:23 INFO ApplicationImpl:464 -
Application application_1667306908598_0003 transitioned from RUNNING to
APPLICATION_RESOURCES_CLEANINGUP
2022-11-01T14:58:23.161177845Z 2022-11-01 14:58:23 INFO AuxServices:196 - Got
event APPLICATION_STOP for appId application_1667306908598_0003
2022-11-01T14:58:23.161567999Z 2022-11-01 14:58:23 INFO ApplicationImpl:464 -
Application application_1667306908598_0003 transitioned from
APPLICATION_RESOURCES_CLEANINGUP to FINISHED{code}
RM log
{code:java}
2022-11-01 14:45:39 INFO RMAppAttemptImpl:1206 - Updating application attempt
appattempt_1667306908598_0003_000001 with final state: FINISHING, and exit
status: -1000
rm_1 | 2022-11-01 14:45:39 INFO RMAppAttemptImpl:813 -
appattempt_1667306908598_0003_000001 State change from RUNNING to FINAL_SAVING
rm_1 | 2022-11-01 14:45:39 INFO RMAppImpl:993 - Updating application
application_1667306908598_0003 with final state: FINISHING
rm_1 | 2022-11-01 14:45:39 INFO RMAppImpl:715 -
application_1667306908598_0003 State change from RUNNING to FINAL_SAVING
rm_1 | 2022-11-01 14:45:39 INFO RMAppAttemptImpl:813 -
appattempt_1667306908598_0003_000001 State change from FINAL_SAVING to FINISHING
rm_1 | 2022-11-01 14:45:39 INFO RMStateStore:223 - Updating info for
app: application_1667306908598_0003
rm_1 | 2022-11-01 14:45:39 INFO RMAppImpl:715 -
application_1667306908598_0003 State change from FINAL_SAVING to FINISHING
scm_1 | 2022-11-01 14:45:39,651 [Under Replicated Processor] INFO
replication.UnderReplicatedProcessor: Processed 0 under replicated containers,
failed processing 0
scm_1 | 2022-11-01 14:45:39,651 [Over Replicated Processor] INFO
replication.OverReplicatedProcessor: Processed 0 over replicated containers,
failed processing 0
rm_1 | 2022-11-01 14:45:40 INFO ApplicationMasterService:354 -
application_1667306908598_0003 unregistered successfully.
-- IDLE fo ~10 min
| 2022-11-01 14:58:23 INFO AbstractLivelinessMonitor:127 -
Expired:appattempt_1667306908598_0003_000001 Timed out after 600 secs
rm_1 | 2022-11-01 14:58:23 INFO ApplicationMasterService:684 -
Unregistering app attempt : appattempt_1667306908598_0003_000001
rm_1 | 2022-11-01 14:58:23 INFO AMRMTokenSecretManager:124 -
Application finished, removing password for appattempt_1667306908598_0003_000001
rm_1 | 2022-11-01 14:58:23 INFO RMAppAttemptImpl:813 -
appattempt_1667306908598_0003_000001 State change from FINISHING to FINISHED
rm_1 | 2022-11-01 14:58:23 INFO RMAppImpl:715 -
application_1667306908598_0003 State change from FINISHING to FINISHED
rm_1 | 2022-11-01 14:58:23 INFO CapacityScheduler:882 - Application
Attempt appattempt_1667306908598_0003_000001 is done. finalState=FINISHED
rm_1 | 2022-11-01 14:58:23 INFO RMAuditLogger:148 - USER=hadoop
OPERATION=Application Finished - Succeeded TARGET=RMAppManager
RESULT=SUCCESS APPID=application_1667306908598_0003
rm_1 | 2022-11-01 14:58:23 INFO RMContainerImpl:410 -
container_1667306908598_0003_01_000001 Container Transitioned from RUNNING to
KILLED
rm_1 | 2022-11-01 14:58:23 INFO FiCaSchedulerApp:113 - Completed
container: container_1667306908598_0003_01_000001 in state: KILLED event:KILL
rm_1 | 2022-11-01 14:58:23 INFO AMLauncher:263 - Cleaning master
appattempt_1667306908598_0003_000001
rm_1 | 2022-11-01 14:58:23 INFO RMAuditLogger:106 - USER=hadoop
OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS
APPID=application_1667306908598_0003
CONTAINERID=container_1667306908598_0003_01_000001
rm_1 | 2022-11-01 14:58:23 INFO SchedulerNode:216 - Released container
container_1667306908598_0003_01_000001 of capacity <memory:2048, vCores:1> on
host nm:36773, which currently has 0 containers, <memory:0, vCores:0> used and
<memory:8192, vCores:8> available, release resources=true
rm_1 | 2022-11-01 14:58:23 INFO LeafQueue:1707 - default
used=<memory:0, vCores:0> numContainers=0 user=hadoop user-resources=<memory:0,
vCores:0>
rm_1 | 2022-11-01 14:58:23 INFO LeafQueue:1658 - completedContainer
container=Container: [ContainerId: container_1667306908598_0003_01_000001,
NodeId: nm:36773, NodeHttpAddress: nm:8042, Resource: <memory:2048, vCores:1>,
Priority: 0, Token: Token { kind: ContainerToken, service: 172.20.0.4:36773 },
] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0,
vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1,
numContainers=0 cluster=<memory:8192, vCores:8>
rm_1 | 2022-11-01 14:58:23 INFO ParentQueue:568 - completedContainer
queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0>
cluster=<memory:8192, vCores:8>
rm_1 | 2022-11-01 14:58:23 INFO ParentQueue:585 - Re-sorting completed
queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0,
usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0,
numApps=1, numContainers=0
rm_1 | 2022-11-01 14:58:23 INFO CapacityScheduler:1419 - Application
attempt appattempt_1667306908598_0003_000001 released container
container_1667306908598_0003_01_000001 on node: host: nm:36773 #containers=0
available=<memory:8192, vCores:8> used=<memory:0, vCores:0> with event: KILL
rm_1 | 2022-11-01 14:58:23 INFO AppSchedulingInfo:117 - Application
application_1667306908598_0003 requests cleared {code}
> MR Acceptance test failure on Streaming Branch (HDDS-4454)
> ----------------------------------------------------------
>
> Key: HDDS-6126
> URL: https://issues.apache.org/jira/browse/HDDS-6126
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Sadanand Shenoy
> Assignee: Sadanand Shenoy
> Priority: Major
> Attachments: NM-jstack.txt
>
>
> The ozone MR Acceptance fails on the streaming branch if
> ozone.fs.datastream.enable is set to true.
> The test basically runs a yarn wordcount job and the observation here is
> that the job doesn’t release the Application master (AM container ) as soon
> as the job is done and the job counters are displayed which is why the second
> job waits for a long time ~10 min for the AM container to be released.
> Tried with enabling streaming and disabling streaming and verified that the
> problem occurs only when ozone.fs.datastream.enable=true
> {code:java}
> yarn jar
> /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi -D
> fs.defaultFS=o3fs://bucket1.volume1.om/ -D ozone.fs.datastream.enable=true
> 3 3{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]