[ 
https://issues.apache.org/jira/browse/TEZ-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116312#comment-17116312
 ] 

Ashutosh Chauhan commented on TEZ-4181:
---------------------------------------

+1

> [Kubernetes] Use hostname + pod UID for shuffle manager caching
> ---------------------------------------------------------------
>
>                 Key: TEZ-4181
>                 URL: https://issues.apache.org/jira/browse/TEZ-4181
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Attila Magyar
>            Assignee: Attila Magyar
>            Priority: Major
>         Attachments: TEZ-4181.patch
>
>
> When a pod restarts, it uses the same hostname and shuffle port. Now when 
> fetcher threads connects to download the shuffle data it will use the cached 
> connection info and since the pod has died it's shuffle data will also get 
> cleaned up. When the pod restarts, it receives connection from clients to 
> download specific shuffle data but the daemon will not have it because of the 
> restart.
> In ShuffleManager.java's knownSrcHosts the key should be updated to HostInfo 
> which is a combination of host+port and the host's unique ID. The host host 
> Id changes when a node is killed or restarted.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to