Hi Elias, Thank you so much for your explanation. I'm looking into improving Samza Standalone and your design-inputs were very useful.
Cheers, Jagadish On Mon, Dec 14, 2015 at 12:47 PM, Elias Levy <fearsome.lucid...@gmail.com> wrote: > On Mon, Dec 14, 2015 at 11:55 AM, Jagadish Venkatraman < > jagadish1...@gmail.com> wrote: > > > > > Thanks for the great work! This is super-helpful. Another cool feature is > > that this implementation pushes alot of failure handling/restarting to > the > > cluster manager. > > > > I've some questions on the implementation. > > > > 1. It's my understanding that each container is a separate pod. When a > pod > > crashes, I assume that the *Kubelet* would bring back the job on the > > **same** > > node? (and we can re-use state since the volume is mounted as an > *emptyDir* > > volume) This is valuable for statefull jobs. > > > > Correct. I create an emptyDir state volume on purpose, so that if the > Docker container dies for some reason, the local state won't go away and > will be available to the Samza container when it is restarted by Kubelet. > I do the same thing for the logs, so you can fetch them and debug any > issues that may be causing the container to fail. > > > > > 2. Also, the containerId and the jobModel are a part of the env. variable > > passed to the pod. How do we ensure that - after a restart, the pods that > > come up (potentially on a different host), have the same containerId? > (Does > > Kubernetes help in this by launching the process with the same env. > > variables as the original run? ) > > > > This behavior is key to ensure that we process all partitions with no 2 > > containers processing the same partition. > > > > At the moment Kubernetes does not have a specific abstraction to model > stateful services that have unique identities within the cluster (e.g. a > Samza job, a Kafka cluster, a ZooKeeper ensemble), but it is easy to work > past this limitation. You do so by creating one ReplicationController with > a replica count of one per group member. In the case of a Samza job, my > code creates a RC per SamzaContainer. Each RC has a slightly different Pod > spec. Each Pod is parametrized a unique Samza container ID via the > SAMZA_CONTAINER_ID environment variable. So the Kubernetes YAML the code > outputs ensures there is only one RC per container ID, and the RC ensures > there is only ever a single container running for its given configuration. > > Makes sense? > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University