On Mon, Dec 14, 2015 at 11:55 AM, Jagadish Venkatraman < jagadish1...@gmail.com> wrote:
> > Thanks for the great work! This is super-helpful. Another cool feature is > that this implementation pushes alot of failure handling/restarting to the > cluster manager. > > I've some questions on the implementation. > > 1. It's my understanding that each container is a separate pod. When a pod > crashes, I assume that the *Kubelet* would bring back the job on the > **same** > node? (and we can re-use state since the volume is mounted as an *emptyDir* > volume) This is valuable for statefull jobs. > Correct. I create an emptyDir state volume on purpose, so that if the Docker container dies for some reason, the local state won't go away and will be available to the Samza container when it is restarted by Kubelet. I do the same thing for the logs, so you can fetch them and debug any issues that may be causing the container to fail. > 2. Also, the containerId and the jobModel are a part of the env. variable > passed to the pod. How do we ensure that - after a restart, the pods that > come up (potentially on a different host), have the same containerId? (Does > Kubernetes help in this by launching the process with the same env. > variables as the original run? ) > > This behavior is key to ensure that we process all partitions with no 2 > containers processing the same partition. > At the moment Kubernetes does not have a specific abstraction to model stateful services that have unique identities within the cluster (e.g. a Samza job, a Kafka cluster, a ZooKeeper ensemble), but it is easy to work past this limitation. You do so by creating one ReplicationController with a replica count of one per group member. In the case of a Samza job, my code creates a RC per SamzaContainer. Each RC has a slightly different Pod spec. Each Pod is parametrized a unique Samza container ID via the SAMZA_CONTAINER_ID environment variable. So the Kubernetes YAML the code outputs ensures there is only one RC per container ID, and the RC ensures there is only ever a single container running for its given configuration. Makes sense?