Hi all, Looks like github deployment used by my university doesn't allow public access. I moved it to github (https://github.com/milinda/samza-ec2-ansible).
Thanks Milinda On Wed, Aug 5, 2015 at 2:03 PM, Milinda Pathirage <mpath...@umail.iu.edu> wrote: > I wrote several Ansible playbooks to deploy YARN (without HDFS), > Zookeeper and Kafka to EC2 for deploying Samza jobs. If you know ansible > those scripts may be helpful. You can find them at > https://github.iu.edu/mpathira/samza-ec2-ansible. I was planning to add > document describing these scripts but could do it yet. I looked at EMR > also, but as I remember EMR job deployment model doesn't work with current > scripts provided by Samza. > > I used R3 instances for Kafka and C3 instances for YARN. As I remember I > could get close to 1million msg/s with 3 node Kafka cluster running on > r3.xlarge instance and 2 (or 4) node YARN cluster running 4 stream tasks > per job. > > Thanks > Milinda > > On Wed, Aug 5, 2015 at 11:27 AM, Gian Merlino <gianmerl...@gmail.com> > wrote: > >> I don't know of any tutorials, but the order to tackle things would be: >> >> 1) Set up ZK- this could be a single node install for a PoC or a 3 or 5 >> node install for production. m3.medium is a reasonable node type. >> >> 2) Set up Kafka- could be a single instance without replication for a PoC. >> For production, as many as you need, and you'd probably want replication. >> I >> think if you want to use local instance storage, i2 instances are good, >> and >> if you want to use EBS, probably m3 instances. >> >> 3) Set up YARN- this could be a single instance (running >> pseudo-distributed >> with master & slave on the same machine) or two instances (one master, one >> slave) for a PoC. I think c3 or r3 instance types are good for the slaves, >> depending on how much memory you need. Workloads without large amounts of >> state should be ok on c3 instances. >> >> EMR might actually work for YARN if you use the long-running kind of >> cluster (see: >> >> http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-plan-longrunning-transient.html >> ). >> I haven't tried that, but it might be worth a shot before going for stock >> apache hadoop. >> >> On Tue, Aug 4, 2015 at 5:58 PM, Job-Selina Wu <swucaree...@gmail.com> >> wrote: >> >> > Dear All: I was looking for the tutorial how to build and run Samza >> on >> > AWS and then I found a link below. I am wondering if there is a detail >> > tutorial about how to build Samza on AWS? >> > >> > Sincerely, >> > Selina >> > >> > >> > >> https://cwiki.apache.org/confluence/display/SAMZA/FAQ#FAQ-HowshouldSamzaberunonAWS >> > ? >> > How should Samza be run on AWS? >> > >> > From Gian Merlino: >> > >> > - We've been using Samza in production on AWS for a little over a >> > month. We're >> > just using the YARN runner on a mostly stock hadoop 2.4.0 cluster >> (not >> > EMR). Our experience is that c3s work well for the YARN instances and >> > i2s >> > work well for the Kafka instances. Things have been pretty solid with >> > that >> > setup. For scaling up and scaling down YARN, we just terminate >> instances >> > or add instances, and this works pretty well. It can take a few >> minutes >> > for the cluster to realize a node has gone and respawn containers >> > elsewhere. We have a separate Kafka cluster just for Samza's use, >> > different from our main Kafka cluster. The main reason is that we >> wanted >> > to isolate off the disk and network load of state compactions and >> > restores (we don't use compacted topics in our main Kafka cluster, >> but >> > we do use them with Samza, and the extra load on Kafka can be >> > substantial). >> > >> > > > > -- > Milinda Pathirage > > PhD Student | Research Assistant > School of Informatics and Computing | Data to Insight Center > Indiana University > > twitter: milindalakmal > skype: milinda.pathirage > blog: http://milinda.pathirage.org > -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org