Attila Magyar created TEZ-4181:
----------------------------------
Summary: [Kubernetes] Use hostname + pod UID for shuffle manager
caching
Key: TEZ-4181
URL: https://issues.apache.org/jira/browse/TEZ-4181
Project: Apache Tez
Issue Type: Bug
Reporter: Attila Magyar
Assignee: Attila Magyar
When a pod restarts, it uses the same hostname and shuffle port. Now when
fetcher threads connects to download the shuffle data it will use the cached
connection info and since the pod has died it's shuffle data will also get
cleaned up. When the pod restarts, it receives connection from clients to
download specific shuffle data but the daemon will not have it because of the
restart.
In ShuffleManager.java's knownSrcHosts the key should be updated to HostInfo
which is a combination of host+port and the host's unique ID. The host host Id
changes when a node is killed or restarted.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)