[jira] [Resolved] (HIVE-23500) [Kubernetes] Use Extend NodeId for LLAP registration

Attila Magyar (Jira) Tue, 19 May 2020 01:12:10 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-23500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Attila Magyar resolved HIVE-23500.
----------------------------------
    Resolution: Duplicate

> [Kubernetes] Use Extend NodeId for LLAP registration
> ----------------------------------------------------
>
>                 Key: HIVE-23500
>                 URL: https://issues.apache.org/jira/browse/HIVE-23500
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>            Reporter: Attila Magyar
>            Assignee: Attila Magyar
>            Priority: Major
>             Fix For: 4.0.0
>
>
> In kubernetes environment where pods can have same host name and port, there 
> can be situations where node trackers could be retaining old instance of the 
> pod in its cache. In case of Hive LLAP, where the llap tez task scheduler 
> maintains the membership of nodes based on zookeeper registry events there 
> can be cases where NODE_ADDED followed by NODE_REMOVED event could end up 
> removing the node/host from node trackers because of stable hostname and 
> service port. The NODE_REMOVED event in this case is old stale event of the 
> already dead pod but ZK will send only after session timeout (in case of 
> non-graceful shutdown). If this sequence of events happen, a node/host is 
> completely lost form the schedulers perspective. 
> To support this scenario, tez can extend yarn's NodeId to include 
> uniqueIdentifier. Llap task scheduler can construct the container object with 
> this new NodeId that includes uniqueIdentifier as well so that stale events 
> like above will only remove the host/node that matches the old 
> uniqueIdentifier. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-23500) [Kubernetes] Use Extend NodeId for LLAP registration

Reply via email to