We have a hadoop cluster running multiple mapreduces continuously on logfiles that can be upto 10GB per day. Our logfiles vary a lot depending on what part of year it is. So every now and then we need to do some capacity planning and come up with forecast, given the forecast for logfile sizes or given the forecast of number of jobs. I need some tips in how to forecast capacity requirements for hadoop cluster. Is it derivable as a function of current hardware load, log size, and future growth. Does jobtracker expose some performance data that will be helpful in planning.
Thanks & Regards Sandhya