[
https://issues.apache.org/jira/browse/TEZ-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306219#comment-15306219
]
Feng Yuan commented on TEZ-3273:
--------------------------------
i find it because allocated many container to task:
2016-05-30 02:10:38,856 INFO [DelayedContainerManager]
rm.YarnTaskSchedulerService: Assigning container to task, container=Container:
[ContainerId: container_1463493135662_117553_01_000032, NodeId:
bjlg-44p91-hadoop77.bfdabc.com:35404, NodeHttpAddress:
bjlg-44p91-hadoop77.bfdabc.com:8042, Resource: <memory:3072, vCores:1>,
Priority: 2, Token: Token { kind: ContainerToken, service: 192.168.44.91:35404
}, ], task=attempt_1463493135662_117553_1_00_000023_0,
containerHost=bjlg-44p91-hadoop77.bfdabc.com, localityMatchType=RackLocal,
matchedLocation=/rack1, honorLocalityFlags=false, reusedContainer=false,
delayedContainers=7, containerResourceMemory=3072, containerResourceVCores=1
[hadoop@bjlg-44p40-hadoop27 yuanfeng]$ cat 553 | grep
"task=attempt_1463493135662_117553_1_00_000024_0, containerHost="
2016-05-30 02:10:38,881 INFO [DelayedContainerManager]
rm.YarnTaskSchedulerService: Assigning container to task, container=Container:
[ContainerId: container_1463493135662_117553_01_000044, NodeId:
bjlg-44p62-hadoop48.bfdabc.com:22928, NodeHttpAddress:
bjlg-44p62-hadoop48.bfdabc.com:8042, Resource: <memory:3072, vCores:1>,
Priority: 2, Token: Token { kind: ContainerToken, service: 192.168.44.62:22928
}, ], task=attempt_1463493135662_117553_1_00_000024_0,
containerHost=bjlg-44p62-hadoop48.bfdabc.com, localityMatchType=RackLocal,
matchedLocation=/rack0, honorLocalityFlags=false, reusedContainer=false,
delayedContainers=2, containerResourceMemory=3072, containerResourceVCores=1
2016-05-30 02:10:43,916 INFO [DelayedContainerManager]
rm.YarnTaskSchedulerService: Assigning container to task, container=Container:
[ContainerId: container_1463493135662_117553_01_000048, NodeId:
bjlg-44p50-hadoop36.bfdabc.com:40906, NodeHttpAddress:
bjlg-44p50-hadoop36.bfdabc.com:8042, Resource: <memory:3072, vCores:1>,
Priority: 2, Token: Token { kind: ContainerToken, service: 192.168.44.50:40906
}, ], task=attempt_1463493135662_117553_1_00_000024_0,
containerHost=bjlg-44p50-hadoop36.bfdabc.com, localityMatchType=RackLocal,
matchedLocation=/rack0, honorLocalityFlags=false, reusedContainer=false,
delayedContainers=4, containerResourceMemory=3072, containerResourceVCores=1
2016-05-30 02:10:44,415 INFO [DelayedContainerManager]
rm.YarnTaskSchedulerService: Assigning container to task, container=Container:
[ContainerId: container_1463493135662_117553_01_000007, NodeId:
bjlg-44p82-hadoop68.bfdabc.com:63544, NodeHttpAddress:
bjlg-44p82-hadoop68.bfdabc.com:8042, Resource: <memory:3072, vCores:1>,
Priority: 2, Token: Token { kind: ContainerToken, service: 192.168.44.82:63544
}, ], task=attempt_1463493135662_117553_1_00_000024_0,
containerHost=bjlg-44p82-hadoop68.bfdabc.com, localityMatchType=RackLocal,
matchedLocation=/rack0, honorLocalityFlags=false, reusedContainer=true,
delayedContainers=5, containerResourceMemory=3072, containerResourceVCores=1
2016-05-30 02:10:44,419 INFO [DelayedContainerManager]
rm.YarnTaskSchedulerService: Assigning container to task, container=Container:
[ContainerId: container_1463493135662_117553_01_000022, NodeId:
bjlg-44p39-hadoop26.bfdabc.com:4334, NodeHttpAddress:
bjlg-44p39-hadoop26.bfdabc.com:8042, Resource: <memory:3072, vCores:1>,
Priority: 2, Token: Token { kind: ContainerToken, service: 192.168.44.39:4334
}, ], task=attempt_1463493135662_117553_1_00_000024_0,
containerHost=bjlg-44p39-hadoop26.bfdabc.com, localityMatchType=RackLocal,
matchedLocation=/rack0, honorLocalityFlags=false, reusedContainer=true,
delayedContainers=2, containerResourceMemory=3072, containerResourceVCores=1
2016-05-30 02:10:44,421 INFO [DelayedContainerManager]
rm.YarnTaskSchedulerService: Assigning container to task, container=Container:
[ContainerId: container_1463493135662_117553_01_000054, NodeId:
bjlg-44p43-hadoop29.bfdabc.com:65059, NodeHttpAddress:
bjlg-44p43-hadoop29.bfdabc.com:8042, Resource: <memory:3072, vCores:1>,
Priority: 2, Token: Token { kind: ContainerToken, service: 192.168.44.43:65059
}, ], task=attempt_1463493135662_117553_1_00_000024_0,
containerHost=bjlg-44p43-hadoop29.bfdabc.com, localityMatchType=RackLocal,
matchedLocation=/rack0, honorLocalityFlags=false, reusedContainer=false,
delayedContainers=0, containerResourceMemory=3072, containerResourceVCores=1
> app.TaskAttemptListenerImpTezDag: Attempt is not recognized for heartbeat in
> tez 0.5.2,cause job hang
> -----------------------------------------------------------------------------------------------------
>
> Key: TEZ-3273
> URL: https://issues.apache.org/jira/browse/TEZ-3273
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.5.2
> Environment: hive0.14 hadoop2.6
> Reporter: Feng Yuan
> Priority: Critical
> Attachments: app_logs.zip
>
>
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> Map 1: 145(+0,-1)/146 Reducer 2: 0/415
> stuck forever~
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)