On Mon, Dec 14, 2015 at 11:55 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

>
> Thanks for the great work! This is super-helpful. Another cool feature is
> that this implementation pushes alot of failure handling/restarting to the
> cluster manager.
>
> I've some questions on the implementation.
>
> 1. It's my understanding that each container is a separate pod. When a pod
> crashes, I assume that the *Kubelet* would bring back the job on the
> **same**
> node? (and we can re-use state since the volume is mounted as an *emptyDir*
> volume) This is valuable for statefull jobs.
>

Correct.  I create an emptyDir state volume on purpose, so that if the
Docker container dies for some reason, the local state won't go away and
will be available to the Samza container when it is restarted by Kubelet.
I do the same thing for the logs, so you can fetch them and debug any
issues that may be causing the container to fail.



> 2. Also, the containerId and the jobModel are a part of the env. variable
> passed to the pod. How do we ensure that - after a restart, the pods that
> come up (potentially on a different host), have the same containerId? (Does
> Kubernetes help in this by launching the process with the same env.
> variables as the original run? )
>
> This behavior is key to ensure that we process all partitions with no 2
> containers processing the same partition.
>

At the moment Kubernetes does not have a specific abstraction to model
stateful services that have unique identities within the cluster (e.g. a
Samza job, a Kafka cluster, a ZooKeeper ensemble), but it is easy to work
past this limitation.  You do so by creating one ReplicationController with
a replica count of one per group member.  In the case of a Samza job, my
code creates a RC per SamzaContainer.  Each RC has a slightly different Pod
spec.  Each Pod is parametrized a unique Samza container ID via the
SAMZA_CONTAINER_ID environment variable.  So the Kubernetes YAML the code
outputs ensures there is only one RC per container ID, and the RC ensures
there is only ever a single container running for its given configuration.

Makes sense?

Reply via email to