Thanks for starting this discussion and great to see the integration with
Kubernetes.  It is very timely because we(at LinkedIn) have recently
started to refocus on Samza-516 and were thinking of kicking off a similar
discussion soon.

I am in agreement with you that there are two possible levels to the
standalone Samza (or rather pluggable cluster manager) implementation.

1. Let the cluster management take care of fault-tolerance for "container"
failures.
   - i.e. if the machine for a container dies, we leave it upto the cluster
manager of choice to start up the container on another machine.  No attempt
is made to redistribute the partitions among the existing containers.

2.  The partition to container mapping gets dynamically adjusted to deal
with "container/node" death.
- your earlier comment of making samza act more like the kafka consumer
actually is more inline with this model.  This is also what Chris had
implemented in Samza 516.

The way I see if (2) should be an additive (and possibly opt-in)
improvement over 1.

In your last email you seem to indicate that the additional functionality
of dynamic partition/container re-assignment on container/node failures
that is implemented in Samza-516 is problemmatic for Kubernetes
integration.  Could you provide some more details on why this doesn't work
for you ??

>From my standpoint, there is another reason it might be attractive to
separate (2) from (1) : engineering expediency.  i.e. (2) adds a lot more
complexity and I feel it will take much longer to stabilize.  This is more
true now that we have host-affinity also fully implemented in Samza 0.10.
(1) on the other hand is easier to stabilize and still covers most of the
scenarios.

Since we are on the subject, it would be good to further elaborate on (1)
..  Here is how I think of (1).
Basically when the job starts for the first time, the system would
dynamically distribute all the partitions across the configured number of
containers.  So the only thing that is statically configured here is the
number of containers (same as how things are today).   This mapping would
be stored durably using the same mechanism that has been introduced in
Samza 0.10 (coordinator stream).

One further addition to this model would be to deal with explicit container
additions and removals. For e.g. if a new container is explicitly added or
removed via a config change, the system would change the partition to
container mapping (and persist it durably back in the coordinator stream).

Thanks
Kartik



On Mon, Nov 30, 2015 at 5:54 PM, Elias Levy <fearsome.lucid...@gmail.com>
wrote:

> BTW, I reviewed the StandaloneJob proposal in SAMZA-516 thinking it could
> have been useful in running Samza in Kubernetes and I was disappointed to
> find out it was not the case. StandaloneJob goes a step beyond the most
> basic job implementation as it attempts to handle failures through
> rebalances managed by the JobCoordinator which is elected via ZoKeeper.
>
> I submit that there is also a need for a simpler type of job, lets call it
> SimpleJob.  This would be a job where, at least until dynamic configuration
> is implemented, the JobCoordinator is executed once a priori to generate a
> JobModel, the user can then configure the JobModel in the containers'
> environment, and execute the containers as he sees fit.
>
> Failure handling and container monitoring is then left as something to be
> handled by whatever system the user chooses to execute the containers.
>
> This model can then be used with many of the existing container
> orchestration systems, whether it be Kubernetes, Docker Swarm, Amazon EC2
> Container Service, CoreOS Fleet, Helios, Marathon, Nomad, etc.
>
> In fact, KubernetesJob is essentially this SimpleJob proposal, except that
> it outputs a Kubernetes config file instead of only the JobModel.
>
>
>
> On Mon, Nov 30, 2015 at 12:31 PM, Jakob Homan <jgho...@gmail.com> wrote:
>
> >   This is awesome work.  Would be interested in opening JIRAs for the
> > changes you need so we can start to process them?
>

Reply via email to