Thank you for your advice So then I need to really look at what memory footprint the custom MR jobs I run need to determine the "jobs per node" right? E.g. with 7G per node, minus the 2G reserved, if I need jobs with -Xmx1G I can run max 5, but safely 4... sound reasonable?
Reasoning... I do a fair bit of geospatial cross referencing, so am building in memory indexes for the Maps to use (Hbase provides point style data, but I often need to cross reference with a set of Polygons to preprocess stuff for mapping etc). So I am always having to watch in memory index size and blowing heap. Additionally I am reusing JVMs (since Hadoop 0.19.0) since in memory index generation is time consuming. (http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html) If not EC2, do people typically rent rack space by the month? - anyone suggest a good provider for say 20 nodes? Cheers, Tim On Fri, Dec 19, 2008 at 2:38 PM, Andrew Purtell <[email protected]> wrote: > Hi Tim, > > I think a basic requirement is extra large instances, > assuming you will be running HBase regionservers alongside > your tasktrackers (and therefore mapred tasks), and also > alongside DFS data nodes. I believe this is the most > common configuration due to the benefit of local i/o and > best use of allocated nodes. > > HBase regionservers are heap intensive applications, and > should have 1G reserved for them alone. Datanodes should > also have 1G heap. Then you need to consider the RAM load > of the remaining tasks. > > - Andy > >> From: tim robertson >> Subject: hbase on EC2 - any guidelines for instance size >> selection? >> To: [email protected] >> Date: Friday, December 19, 2008, 5:22 AM >> Hi, >> >> I have been using EC2 for various MR jobs, and when I am >> doing this I can pretty much determine what EC2 instance >> size will best meet my needs (e.g. large lookup memory >> indexes in Map requires large instance, low memory >> processing intensive stuff happy with many small >> etc) and how many jobs per node etc but for HBase I am >> not sure what MR it is really going to run underneath... >> Are there any rules of thumb for picking EC2 instance >> types for HBase usage? > [...] > > > > >
