[
https://issues.apache.org/jira/browse/TEZ-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Magyar updated TEZ-4181:
-------------------------------
Attachment: TEZ-4181.patch
> [Kubernetes] Use hostname + pod UID for shuffle manager caching
> ---------------------------------------------------------------
>
> Key: TEZ-4181
> URL: https://issues.apache.org/jira/browse/TEZ-4181
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Attila Magyar
> Assignee: Attila Magyar
> Priority: Major
> Attachments: TEZ-4181.patch
>
>
> When a pod restarts, it uses the same hostname and shuffle port. Now when
> fetcher threads connects to download the shuffle data it will use the cached
> connection info and since the pod has died it's shuffle data will also get
> cleaned up. When the pod restarts, it receives connection from clients to
> download specific shuffle data but the daemon will not have it because of the
> restart.
> In ShuffleManager.java's knownSrcHosts the key should be updated to HostInfo
> which is a combination of host+port and the host's unique ID. The host host
> Id changes when a node is killed or restarted.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)