[ https://issues.apache.org/jira/browse/TEZ-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116312#comment-17116312 ]
Ashutosh Chauhan commented on TEZ-4181: --------------------------------------- +1 > [Kubernetes] Use hostname + pod UID for shuffle manager caching > --------------------------------------------------------------- > > Key: TEZ-4181 > URL: https://issues.apache.org/jira/browse/TEZ-4181 > Project: Apache Tez > Issue Type: Bug > Reporter: Attila Magyar > Assignee: Attila Magyar > Priority: Major > Attachments: TEZ-4181.patch > > > When a pod restarts, it uses the same hostname and shuffle port. Now when > fetcher threads connects to download the shuffle data it will use the cached > connection info and since the pod has died it's shuffle data will also get > cleaned up. When the pod restarts, it receives connection from clients to > download specific shuffle data but the daemon will not have it because of the > restart. > In ShuffleManager.java's knownSrcHosts the key should be updated to HostInfo > which is a combination of host+port and the host's unique ID. The host host > Id changes when a node is killed or restarted. > -- This message was sent by Atlassian Jira (v8.3.4#803005)