Hi Tim, I think a basic requirement is extra large instances, assuming you will be running HBase regionservers alongside your tasktrackers (and therefore mapred tasks), and also alongside DFS data nodes. I believe this is the most common configuration due to the benefit of local i/o and best use of allocated nodes.
HBase regionservers are heap intensive applications, and should have 1G reserved for them alone. Datanodes should also have 1G heap. Then you need to consider the RAM load of the remaining tasks. - Andy > From: tim robertson > Subject: hbase on EC2 - any guidelines for instance size > selection? > To: [email protected] > Date: Friday, December 19, 2008, 5:22 AM > Hi, > > I have been using EC2 for various MR jobs, and when I am > doing this I can pretty much determine what EC2 instance > size will best meet my needs (e.g. large lookup memory > indexes in Map requires large instance, low memory > processing intensive stuff happy with many small > etc) and how many jobs per node etc but for HBase I am > not sure what MR it is really going to run underneath... > Are there any rules of thumb for picking EC2 instance > types for HBase usage? [...]
