[
https://issues.apache.org/jira/browse/HIVE-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218511#comment-15218511
]
Siddharth Seth commented on HIVE-10280:
---------------------------------------
bq. The code looks reasonable... the logic though, would it mean one temp
failure will make AM discard all tasks on the node?
Yes. The intent is to retry the message based on the configured RPC retry in
LlapProtocolClientProxy.
bq. also assume it's safe to mark running tasks as killed from AM perspective
(wrt potential future events from them, etc.); however should we try to send
kill to them (and ignore the failures) so they don't hog resources actually it
may be a good idea to send a kill if we received a status update from some task
that we declared dead.
Yes, it's safe to mark a running task as KILLED. We could try sending a kill
message, but that will likely not go through either since the state update did
not go through.
If these tasks do successfully send in a heartbeat, they will automatically be
told to die - since the task has been marked as KILLED.
Do you think we should still try sending a kill message ?
> LLAP: Handle errors while sending source state updates to the daemons
> ---------------------------------------------------------------------
>
> Key: HIVE-10280
> URL: https://issues.apache.org/jira/browse/HIVE-10280
> Project: Hive
> Issue Type: Sub-task
> Components: llap
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Attachments: HIVE-10280.1.patch
>
>
> Will likely be handled as marking the node as bad. May need a retry policy in
> place though before marking a node bad to handle temporary network glitches.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)