[ https://issues.apache.org/jira/browse/MAPREDUCE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013996#comment-17013996 ]
shilongfei commented on MAPREDUCE-7253: --------------------------------------- The log of this failed map is as follows. The first attempt is scheduled at 9:46, the resource is allocated at 9:50, the second attempt is scheduled at 9:52, and no subsequent resources are allocated. {code:java} 2020-01-02 09:46:14,181 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1577272892024_639901_m_007585_0 TaskAttempt Transitioned from NEW to UNASSIGNED 2020-01-02 09:50:11,102 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1577272892024_639901_m_007585_0 TaskAttempt Transitioned from UNASSIGNED to ASSIGNED 2020-01-02 09:50:11,146 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1577272892024_639901_m_007585_0 TaskAttempt Transitioned from ASSIGNED to RUNNING 2020-01-02 09:52:11,944 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1577272892024_639901_m_007585_1 TaskAttempt Transitioned from NEW to UNASSIGNED {code} >From the AM allocation container log, it can be seen that the first attempt to >the current map is assigned to the container at 9:50, and the second attempt >has not been {code:java} 2020-01-02 09:46:16,377 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_e922_1577272892024_639901_01_000002 to attempt_1577272892024_639901_m_000250_0 2020-01-02 09:46:16,379 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_e922_1577272892024_639901_01_000003 to attempt_1577272892024_639901_m_000839_0 ...... 2020-01-02 09:50:11,095 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container container_e922_1577272892024_639901_01_016146 to attempt_1577272892024_639901_m_007585_0 ...... {code} Looking at the container application log, it can be found that after 9:52 AM has applied for resources, but there is no second attempt allocated to the map. {code:java} 2020-01-02 09:52:30,084 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1 2020-01-02 09:52:31,093 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 3 2020-01-02 09:52:33,110 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1 2020-01-02 09:52:34,119 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1 2020-01-02 09:55:11,234 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1 {code} The requested resources are allocated to reduce {code:java} 2020-01-02 09:52:31,093 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce 2020-01-02 09:52:33,110 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce 2020-01-02 09:52:34,119 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce 2020-01-02 09:55:11,234 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce{code} Because the timeout event does not carry the rescheduler flag {code:java} if(taskTimedOut) { // task is lost, remove from the list and raise lost event iterator.remove(); eventHandler.handle(new TaskAttemptDiagnosticsUpdateEvent(entry .getKey(), "AttemptID:" + entry.getKey().toString() + " Timed out after " + taskTimeOut / 1000 + " secs")); eventHandler.handle(new TaskAttemptEvent(entry.getKey(), TaskAttemptEventType.TA_TIMED_OUT)); }{code} > MR job got hanged forever when map task timeout > ----------------------------------------------- > > Key: MAPREDUCE-7253 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7253 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: shilongfei > Priority: Major > Attachments: 2020-01-02 11-15-42屏幕截图.png, 2020-01-02 > 11-15-58屏幕截图.png, 2020-01-02 11-16-14屏幕截图.png > > > When the MR map job fails and the reduce job has started to be scheduled, the > failed map will not be scheduled, it will always be in the NEW state, and the > reduce job will also be blocked. > So why was the map job not scheduled in time? !2020-01-02 11-15-58屏幕截图.png! > !2020-01-02 11-15-42屏幕截图.png! > !2020-01-02 11-16-14屏幕截图.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org