[ 
https://issues.apache.org/jira/browse/HIVE-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053836#comment-15053836
 ] 

Siddharth Seth commented on HIVE-12659:
---------------------------------------

This is related to the jira which attempts to detect instances of an LLAP 
cluster going down.
Ideally, we should be able to get enough information from the registry in 
Zookeeper to make a decision about whether to continuously attempt to run the 
query, or exit.
Alternately, we can start tracking the status of individual nodes to decide 
that an LLAP cluster is in an 'unhealthy' state.

> LLAP should detect all nodes down state and stop issuing queries
> ----------------------------------------------------------------
>
>                 Key: HIVE-12659
>                 URL: https://issues.apache.org/jira/browse/HIVE-12659
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Prasanth Jayachandran
>
> I ran a simple query with 1 task in llap and for some reason llap daemon was 
> down (all nodes down scenario). But queries got submitted repeatedly to the 
> daemon and got killed by tez AM infinitely. Single task got killed over 20 
> times and had to ctrl + c. We need to detect all nodes down scenarios (using 
> Zookeeper?) and notify the client of the scenario and fail early. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to