Hi, I have been using EC2 for various MR jobs, and when I am doing this I can pretty much determine what EC2 instance size will best meet my needs (e.g. large lookup memory indexes in Map requires large instance, low memory processing intensive stuff happy with many small etc) and how many jobs per node etc but for HBase I am not sure what MR it is really going to run underneath...
Are there any rules of thumb for picking EC2 instance types for HBase usage? Anyone done comparisons with many small versus fewer large instances? My usage is likely to be 1 table (100 millions rows) and 3 column families to start with, each with 50 or so columns. Operations will be scanning to populate family 2 with operations from family 1 and many single requests, and single inserts. Or perhaps a suitable answer would be - "each configuration is different based on usage, so start experimenting" ? Thanks, Tim
