[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430022#comment-16430022
 ] 

Xuan Gong commented on MAPREDUCE-7042:
--------------------------------------

Here is what really happens:
1) We send a kill signal to AM, and AM starts to stop gracefully. The first 
service which AM starts to stop is JobHistoryEventHandler. 
2) In JobHistoryEventHandler, we handle the ATS events and JHS events together. 
Right now, the event handler grabs the lock and tries to publish ATS entity to 
ATS v2. But due to the Connection issue (whatever issue which causes the 
SocketTimeoutException), the event handler gets blocked. So, the whole process 
which stops JobHistoryEventHandler gets blocked as well.
3) When we send a kill signal to AM, we set a 
"yarn.app.mapreduce.am.hard-kill-timeout-ms" (by default 10s). After 10s 
waiting, it would send a kill signal to RM, and do the forced kill for this 
application.
4) The event handler still gets blocked, so it does not get a chance to handle 
the rest of the JHS events before it gets forced killed.

So, we can not find anything for this application in JHS.

> Killed mapreduce job data does not move to mapreduce.jobhistory.done-dir when 
> ATS v2 is enabled
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7042
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7042
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Yesha Vora
>            Assignee: Xuan Gong
>            Priority: Major
>
> Steps:
> 1) Start a mapreduce job
> {code}
> hadoop jar 
> /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient-3.0.0.3.0.0.0-751-tests.jar
>  sleep "-Dmapreduce.job.user.name=hrt_qa"   -m 10 -r 1 -mt 1000  -rt 
> 1000{code}
> 2) kill job
> 3) Validate job is present at mapreduce.jobhistory.done-dir
> mapreduce.jobhistory.done-dir dir does not have job_1516776025831_0048 entry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to