OK, if you don't mind me stretching this simple conversation a bit more..
Say I use the medium ec2 instance.. that's about 7.5G of ram, so I have
abgout 6.5 total.
On any one node I would have:
DataNode
TaskTracker
Zookeeper
RegionServer
+Map/Reduce Tasks?
What would your gut be for distributing the memory?
Can I run my M/R Tasks all sharing one JVM to share the same memory, or
does each Map or Reduce have it's own JVM/Memory requirements?
I'm thinking between 5 to 10 nodes. I know that this seems stingy for
what you guys are used to.. but this is my worst case or minimum
allocation.. if need be I can plan to get more nodes and spread around
the load (bursting on heavy days, etc).. but I don't want to plan/budget
for a large number of nodes until we see good ROI, etc etc etc..
On 7/14/09 11:54 PM, Nitay wrote:
Yes, Ryan's right. While we recommend running ZooKeeper on separate hosts,
it is really only if you can afford to do so. Otherwise, choose some of your
region server machines and run ZooKeeper alongside those.
On Tue, Jul 14, 2009 at 10:34 PM, Ryan Rawson<[email protected]> wrote:
You can probably host it all on one set of machines. You'll need the
large sized.
Let us know how EC2 works, performance might be off due to the
virtualization.
On Tue, Jul 14, 2009 at 10:32 PM, Fernando Padilla<[email protected]>
wrote:
The reason I ask, is that I'm planning on setting up a small HBase
cluster
in ec2..
having 3 to 5 instances just for zookeeper, while having only 3 to 5
instances for Hbase.. it sounds lop-sided. :)
Does anyone here have any experience with HBase in EC2?
Ryan Rawson wrote:
I run my ZK quorum on my regionservers, but I also have 16 GB ram per
regionserver. I used to run 1gb, and never had problems. Now with
hbase managing the quorum I have 5gb ram, and its probalby over kill
but better save than sorry.
On Tue, Jul 14, 2009 at 6:07 PM, Nitay<[email protected]> wrote:
Hi Fernando,
It is recommended that you run ZooKeeper separate from the Region
Servers.
On the memory side, our use of ZooKeeper in terms of data stored is
minimal
currently. However you definitely don't want it to swap and you want to
be
able to handle a large number of connections. A safe value would be
something like 1GB.
-n
On Tue, Jul 14, 2009 at 2:58 PM, Fernando Padilla<[email protected]>
wrote:
So.. what's the recommendation for zookeeper?
should I run zookeeper nodes on the same region servers?
should I run zookeeper nodes external to the region servers?
how much memory should I give zookeeper, if it's just used for hbase?