I was too critical in my last comment on StandaloneJob.  More specifically,
StandaloneJob did not aid much on my implementation, which is based on the
current 0.9.1 model of static a priori partition assignment.  Within that
model all that is needed is a command that takes in the job config and
spits out the serialized JobModel, and an easy method to pass the JobModel
to SamzaContainer, which can then be used to start SamzaContainers within
whatever container execution framework you've chosen.

That said, yes, I do believe dynamic assignment should be implemented.  Not
because of fault-tolerance, which I think is the domain of whatever
container execution system is used to implement the job, but to enable
scaling up and down of a job or the dynamic addition and removal of
partitions without having to restart the job.

Dynamic assignment would also null the need in many system for a JobRunner
that requires access to the cluster services, making it easier to bootstrap
a job using only a config file and a number of immutable SamzaContainers
that implement the job.  I think we all agree that Samza's current job
bootstrapping is convoluted.

I agree with you that work on dynamic assignment is additive on work on a
simpler job that is statically configured on startup.

One aspect of the StandaloneJob proposal that was somewhat off-putting was
that the rebalancing was performed by shifting SamzaContainers around
rather than shifting partitions among containers.  That adds an extra layer
of conceptual complexity.  I understand that from an implementation
perspective the easiest thing to do is to shift SamzaContainers around, as
they appear to be immutable after the job starts and changing that may
require a lot of work, but conceptually it seems to make more sense to
shift partitions among SamzaContainers and execute a single SamzaContainer
per "container" (e.g. YARN container, Docker container, etc).  Then again,
that is an implementation detail that would be largely hidden from the job
developer, although it would be exposed to the job admin, as the number of
SamzaContainers would become his parallelism limit.


On Mon, Nov 30, 2015 at 10:14 PM, Kartik Paramasivam <
kparamasi...@linkedin.com.invalid> wrote:

> Thanks for starting this discussion and great to see the integration with
> Kubernetes.  It is very timely because we(at LinkedIn) have recently
> started to refocus on Samza-516 and were thinking of kicking off a similar
> discussion soon.
>
> I am in agreement with you that there are two possible levels to the
> standalone Samza (or rather pluggable cluster manager) implementation.
>
> 1. Let the cluster management take care of fault-tolerance for "container"
> failures.
>    - i.e. if the machine for a container dies, we leave it upto the cluster
> manager of choice to start up the container on another machine.  No attempt
> is made to redistribute the partitions among the existing containers.
>
> 2.  The partition to container mapping gets dynamically adjusted to deal
> with "container/node" death.
> - your earlier comment of making samza act more like the kafka consumer
> actually is more inline with this model.  This is also what Chris had
> implemented in Samza 516.
>
> The way I see if (2) should be an additive (and possibly opt-in)
> improvement over 1.
>
> In your last email you seem to indicate that the additional functionality
> of dynamic partition/container re-assignment on container/node failures
> that is implemented in Samza-516 is problemmatic for Kubernetes
> integration.  Could you provide some more details on why this doesn't work
> for you ??
>
> From my standpoint, there is another reason it might be attractive to
> separate (2) from (1) : engineering expediency.  i.e. (2) adds a lot more
> complexity and I feel it will take much longer to stabilize.  This is more
> true now that we have host-affinity also fully implemented in Samza 0.10.
> (1) on the other hand is easier to stabilize and still covers most of the
> scenarios.
>
> Since we are on the subject, it would be good to further elaborate on (1)
> ..  Here is how I think of (1).
> Basically when the job starts for the first time, the system would
> dynamically distribute all the partitions across the configured number of
> containers.  So the only thing that is statically configured here is the
> number of containers (same as how things are today).   This mapping would
> be stored durably using the same mechanism that has been introduced in
> Samza 0.10 (coordinator stream).
>
> One further addition to this model would be to deal with explicit container
> additions and removals. For e.g. if a new container is explicitly added or
> removed via a config change, the system would change the partition to
> container mapping (and persist it durably back in the coordinator stream).
>
> Thanks
> Kartik
>
>
>
> On Mon, Nov 30, 2015 at 5:54 PM, Elias Levy <fearsome.lucid...@gmail.com>
> wrote:
>
> > BTW, I reviewed the StandaloneJob proposal in SAMZA-516 thinking it could
> > have been useful in running Samza in Kubernetes and I was disappointed to
> > find out it was not the case. StandaloneJob goes a step beyond the most
> > basic job implementation as it attempts to handle failures through
> > rebalances managed by the JobCoordinator which is elected via ZoKeeper.
> >
> > I submit that there is also a need for a simpler type of job, lets call
> it
> > SimpleJob.  This would be a job where, at least until dynamic
> configuration
> > is implemented, the JobCoordinator is executed once a priori to generate
> a
> > JobModel, the user can then configure the JobModel in the containers'
> > environment, and execute the containers as he sees fit.
> >
> > Failure handling and container monitoring is then left as something to be
> > handled by whatever system the user chooses to execute the containers.
> >
> > This model can then be used with many of the existing container
> > orchestration systems, whether it be Kubernetes, Docker Swarm, Amazon EC2
> > Container Service, CoreOS Fleet, Helios, Marathon, Nomad, etc.
> >
> > In fact, KubernetesJob is essentially this SimpleJob proposal, except
> that
> > it outputs a Kubernetes config file instead of only the JobModel.
> >
> >
> >
> > On Mon, Nov 30, 2015 at 12:31 PM, Jakob Homan <jgho...@gmail.com> wrote:
> >
> > >   This is awesome work.  Would be interested in opening JIRAs for the
> > > changes you need so we can start to process them?
> >
>

Reply via email to