Roger,

You are welcomed.  If you want to experiment, you can use my hello samza
<https://hub.docker.com/r/elevy/hello-samza/> Docker image.

On Sun, Nov 29, 2015 at 12:19 PM, Roger Hoover <roger.hoo...@gmail.com>
wrote:

> Elias,
>
> I would also love to be able to deploy Samza on Kubernetes with dynamic
> task management.  Thanks for sharing this.  It may be a good interim
> solution.
>
> Roger
>
> On Sun, Nov 29, 2015 at 11:18 AM, Elias Levy <fearsome.lucid...@gmail.com>
> wrote:
>
> > I've been exploring Samza for stream processing as well as Kubernetes as
> a
> > container orchestration system and I wanted to be able to use one with
> the
> > other.  The prospect of having to execute YARN either along side or on
> top
> > of Kubernetes did not appeal to me, so I developed a KubernetesJob
> > implementation of SamzaJob.
> >
> > You can find the details at
> https://github.com/eliaslevy/samza_kubernetes,
> > but in summary KubernetesJob executes and generates a serialized
> JobModel.
> > Instead of interacting with Kubernetes directly to create the
> > SamzaContainers (as the YarnJob's SamzaApplicationMaster may do with the
> > YARN RM), it output a config YAML file that can be used to create the
> > SamzaContainers in Kubernetes by using Resource Controllers.  For this
> you
> > require to package your job as a Docker image.  You can reach the README
> at
> > the above repo for details.
> >
> > A few observations:
> >
> > It would be useful if SamzaContainer accepted the JobModel via an
> > environment variable.  Right not it expects a URL to download it from.  I
> > get around this by using a entry point script that copies the model from
> an
> > environment variable into a file, then passes a file URL to
> SamzaContainer.
> >
> > SamzaContainer doesn't allow you to configure the JMX port.  It selects a
> > port at random from the ephemeral range as it expects to execute in YARN
> > where a static port could result in a conflict.  This is not the case in
> > Kubernetes where each Pod (i.e. SamzaContainer) is given its own IP
> > address.
> >
> > This implementation doesn't provide a Samza dashboard, which in the YARN
> > implementation is hosted in the Application Master.  There didn't seem to
> > be much value provided by the dashboard that is not already provided by
> the
> > Kubernetes tools for monitoring pods.
> >
> > I've successfully executed the hello-samza jobs in Kubernetes:
> >
> > $ kubectl get po
> > NAME                       READY     STATUS    RESTARTS   AGE
> > kafka-1-jjh8n              1/1       Running   0          2d
> > kafka-2-buycp              1/1       Running   0          2d
> > kafka-3-tghkp              1/1       Running   0          2d
> > wikipedia-feed-0-4its2     1/1       Running   0          1d
> > wikipedia-parser-0-l0onv   1/1       Running   0          17h
> > wikipedia-parser-1-crrxh   1/1       Running   0          17h
> > wikipedia-parser-2-1c5nn   1/1       Running   0          17h
> > wikipedia-stats-0-3gaiu    1/1       Running   0          16h
> > wikipedia-stats-1-j5qlk    1/1       Running   0          16h
> > wikipedia-stats-2-2laos    1/1       Running   0          16h
> > zookeeper-1-1sb4a          1/1       Running   0          2d
> > zookeeper-2-dndk7          1/1       Running   0          2d
> > zookeeper-3-46n09          1/1       Running   0          2d
> >
> >
> > Finally, accessing services within the Kubernetes cluster from the
> outside
> > is quite cumbersome unless one uses an external load balancer.  This
> makes
> > it difficult to bootstrap a job, as SamzaJob must connect to Zookeeper
> and
> > Kafka to find out the number of partitions on the topics it will
> subscribe
> > to, so it can assign them statically among the number of containers
> > requested.
> >
> > Ideally Samza would operate along the lines of the Kafka high-level
> > consumer, which dynamically coordinate to allocate work among members of
> a
> > consumer group.  This would do away with the new to execute SamzaJob a
> > priori to generate the JobModel to pass to the SamzaContainers.  It would
> > also allow for dynamically changing the number of containers without
> having
> > the shutdown the job.
> >
>

Reply via email to