nicknezis commented on issue #3542: URL: https://github.com/apache/incubator-heron/issues/3542#issuecomment-650497830
I've discussed this with @windhamwong and have come up with a proposed approach to handle Heron topologies in Kubernetes. We found that it does work, but there are some edge cases that can cause the topology StatefulSet to fail. 1. TMaster looks for `/.dockerenv` to determine if Tmaster is running in a container. I have found situations in which the pod does not have this file (i.e. Kind and K3s) [Tmaster code](https://github.com/apache/incubator-heron/blob/cc815d85305dc0b665a2ccb42113cf7a49b1eb0a/heron/executor/src/python/heron_executor.py#L232) 2. If TMaster does find `/.dockerenv` it will try to use the `HOST` environment variable. I have found some use cases in which the Pod does not have this set (i.e. Kind). 3. If both of these work, then the TMaster and Stmgr processes will use the pod's IP address. If either fails, then the `socket.hostname()` call will return the pod name, which is not stored in the Kubernetes cluster DNS. 4. To enable the use of the hostname, we need to have a Headless Service registered. The proposal: 1. Update Kubernetes Scheduler code to create a matching Headless Service for each topology created. 2. Update the Kubernetes Scheduler code to add a custom ENV variable on the StatefulSet (i.e. `HERON_HOSTNAME`) 3. Update the TMaster logic that checks for `/.dockerenv` to instead first check for `HERON_HOSTNAME` variable. If we make these changes, this issue would be resolved. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
