A previous post to core-user mentioned some formula to determine job time. I was wondering if anyone out there is trying to tackle designing a formula that can calculate the job run time of a map/reduce program. Obviously there are many variables here including but not limited to Disk Speed ,Network Speed, Processor Speed, input data, many constants , data-skew, map complexity, reduce complexity, # of nodes......
As an intellectual challenge has anyone starting trying to write a formula that can take into account all these factors and try to actually predict a job time in minutes/hours?
