[ 
https://issues.apache.org/jira/browse/TEZ-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated TEZ-2317:
------------------------------------
    Attachment: AM-taskkill.log

 For a complex DAG when there were lot of events generated and it could not 
process the events fast enough, we (me and [~bikassaha]) saw that many tasks 
were killed because only TA_SCHEDULE was processed and before it got to 
processing the RUNNING event that it got a commit go/no-go request which is a 
separate async call that does not go via the event queue. These issues were 
mostly with ONE-ONE edges Pig was using for distributed order by with sampling 
and  since it was not doing much except partitioning they were finishing too 
fast as well.

Issues to fix:
   - Optimize by not sending a commit go/no-go request if there is no hdfs 
output (DataSink) involved. In the above case, it is always intermediate output.
   - Handle the commit go/no-go request after processing events in the event 
queue. May be something like ask the task to come back after some time.
   - We saw that for 3058 KilledTaskAttempts TA_KILL_REQUEST events was 383519. 
This is way high. 
   - In the attached AM-taskkill.log which has grepped statements for a single 
task that was killed, it has 327 repeats of below message. Need to see why so 
much and fix that. 
{code}
2015-04-13 23:19:11,126 INFO [IPC Server handler 22 on 53043] 
app.TaskAttemptListenerImpTezDag: Commit go/no-go request from 
attempt_1428329756093_374362_1_29_008426_0
2015-04-13 23:19:11,126 INFO [IPC Server handler 22 on 53043] impl.TaskImpl: 
Task not running. Issuing kill to bad commit attempt 
attempt_1428329756093_374362_1_29_008426_0
{code}

Please create separate jiras as required.

> Successful task attempts getting killed
> ---------------------------------------
>
>                 Key: TEZ-2317
>                 URL: https://issues.apache.org/jira/browse/TEZ-2317
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>         Attachments: AM-taskkill.log
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to