Hi, Selina,

As Gian mentioned, the first thing to set up the real-time stream
processing environment is to: a) set up a Kafka cluster; b) set up a YARN
cluster. The following links may get you started:
https://www.linkedin.com/pulse/20140813032057-89781742-deploy-kafka-cluster-on-aws
http://blog.c2b2.co.uk/2014/05/hadoop-v2-overview-and-cluster-setup-on.html

-Yi

On Tue, Aug 4, 2015 at 5:58 PM, Job-Selina Wu <swucaree...@gmail.com> wrote:

> Dear All:    I was looking for the tutorial how to build and run Samza on
> AWS and then I found a link below. I am wondering if there is a detail
> tutorial about how to build Samza on AWS?
>
> Sincerely,
> Selina
>
>
> https://cwiki.apache.org/confluence/display/SAMZA/FAQ#FAQ-HowshouldSamzaberunonAWS
> ?
> How should Samza be run on AWS?
>
> From Gian Merlino:
>
>    - We've been using Samza in production on AWS for a little over a
> month. We're
>    just using the YARN runner on a mostly stock hadoop 2.4.0 cluster (not
>    EMR). Our experience is that c3s work well for the YARN instances and
> i2s
>    work well for the Kafka instances. Things have been pretty solid with
> that
>    setup. For scaling up and scaling down YARN, we just terminate instances
>    or add instances, and this works pretty well. It can take a few minutes
>    for the cluster to realize a node has gone and respawn containers
>    elsewhere. We have a separate Kafka cluster just for Samza's use,
>    different from our main Kafka cluster. The main reason is that we wanted
>    to isolate off the disk and network load of state compactions and
>    restores (we don't use compacted topics in our main Kafka cluster, but
>    we do use them with Samza, and the extra load on Kafka can be
>    substantial).
>

Reply via email to