We have a hadoop cluster running multiple mapreduces continuously on
logfiles that can be upto 10GB per day. Our logfiles vary a lot
depending on what part of year it is. So every now and then we need to
do some capacity planning and come up with forecast, given the
forecast for logfile sizes or given the forecast of number of jobs. I
need some tips in how to forecast capacity requirements for hadoop
cluster. Is it derivable as a function of current hardware load, log
size, and future growth. Does jobtracker expose some performance data
that will be helpful in planning.

Thanks & Regards
Sandhya

Reply via email to