[ 
https://issues.apache.org/jira/browse/HIVE-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218511#comment-15218511
 ] 

Siddharth Seth commented on HIVE-10280:
---------------------------------------

bq. The code looks reasonable... the logic though, would it mean one temp 
failure will make AM discard all tasks on the node? 
Yes. The intent is to retry the message based on the configured RPC retry in 
LlapProtocolClientProxy.

bq.  also assume it's safe to mark running tasks as killed from AM perspective 
(wrt potential future events from them, etc.); however should we try to send 
kill to them (and ignore the failures) so they don't hog resources actually it 
may be a good idea to send a kill if we received a status update from some task 
that we declared dead.
Yes, it's safe to mark a running task as KILLED. We could try sending a kill 
message, but that will likely not go through either since the state update did 
not go through.
If these tasks do successfully send in a heartbeat, they will automatically be 
told to die - since the task has been marked as KILLED.

Do you think we should still try sending a kill message ?

> LLAP: Handle errors while sending source state updates to the daemons
> ---------------------------------------------------------------------
>
>                 Key: HIVE-10280
>                 URL: https://issues.apache.org/jira/browse/HIVE-10280
>             Project: Hive
>          Issue Type: Sub-task
>          Components: llap
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: HIVE-10280.1.patch
>
>
> Will likely be handled as marking the node as bad. May need a retry policy in 
> place though before marking a node bad to handle temporary network glitches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to