Thanks Darion and Garry, this is helpful.

I have read that Zookeeper is very latency-sensitive.

I'll definitely try YARN NM on all 3 hosts.

I'd be happy to contribute our findings to a FAQ or wiki page. One so far
is that YARN is the most complicated bit within this setup process, since
there is scant documentation on how to set up YARN without dragging in the
rest of Hadoop.

By the way I did come across an excellent presentation byPhilip O'Toole of
Loggly (video <https://www.youtube.com/watch?v=LpNbjXFPyZ0>,
slides<http://www.slideshare.net/AmazonWebServices/infrastructure-at-scale-apache-kafka-twitter-storm-elastic-search-arc303-aws-reinvent-2013>)
that discusses how they use Kafka and Storm on EC2. No Samza. O'Toole
mentions using EBS volumes for Kafka and says they create daily volume
snapshots for disaster recovery purposes. I haven't found any mention of
disaster recovery for Kafka or Samza and I wondered if that even makes
sense given the replication/partition approach.



On Thu, Apr 24, 2014 at 11:15 PM, darion <[email protected]>wrote:

> Samza is based on JVM  and Ubuntu maybe ok
>
> Samaza I haven't used  but  Spark  and  Storm  is working well  on EC2
>  both seems similar
>
> 于 14-4-25 上午3:18, Oshoma Momoh 写道:
>
>  Hi all,
>>
>> I am setting up a Samza cluster for the first time, and am now at the
>> point
>> of deploying on EC2.  Hopefully this is the correct place to ask a few
>> newbie questions. I'm impressed and excited by what I've seen so far,
>> eager
>> to get going with a real deployment.
>>
>> 1. Does anyone have good or bad experiences to report in running Samza
>> atop
>> Ubuntu 14.04 LTS? (Versus 12.04.)
>>
>> 2. Any best practices to recommend in terms of setup on EC2? E.g. instance
>> types to use, EBS volumes versus non-EBS, and so on.  I've found several
>> threads with conflicting opinions on all of this. Our current plan is...
>> (a) Use EBS volumes, separating Zookeeper from Kafka.
>> (b) Start with three m3.large instances to begin with and upgrade later as
>> needed, since our initial data volume will be low
>> (c) Kafka + Zookeeper + Yarn Node Manager on two worker nodes, and Kafka +
>> Zookeeper + Yarn Resource Manager on the third node.
>>
>> Regards,
>>
>> osh
>>
>> Oshoma Momoh
>> http://pcglab.com
>>
>>
>

Reply via email to