This is pretty predictable. Determine the average time it takes to process a m/r task. If you can process 100 m/r tasks simultaneously, and then cut that to 50 m/r tasks you can handle simultaneously, your job will take twice as long to run.
Granted this will give you a rough estimate of how long it will take your job to run. HTH -Mike > Date: Wed, 7 Jul 2010 09:00:47 +0200 > From: [email protected] > To: [email protected] > Subject: Re: decomission a node > > Yes the effect of "scaling down" was the first thing I wanted to look at. > To process X GB it currently takes Y seconds with Z nodes. > If I process X GB with Z/2 nodes, does it take Y/2 seconds? > How about Z-1,Z-2,Z-3,.... nodes? > > Right now my MR job process alot of small files (2000 files, @2.5MB each) > individually, so the next test would involve changing my MR job to combine > the small files into bigger pieces (closer to hdfs block size) and see > if that > is more effective. > > Each line of my small files has a timestamp column and 55 columns with > numerical data and my reducer needs to calc the column averages for > certain time periods (last day, last hour,etc.) based on the timestamp. > > Alan > > On 07/06/2010 08:06 PM, Allen Wittenauer wrote: > > On Jul 6, 2010, at 8:35 AM, Michael Segel wrote: > > > >> I'm also not sure how dropping a node will test the scalability. You would > >> be testing resilience. > >> > > He's testing scale down, not scale up (which is the way we normally think > > of things... I was confused by the wording too). > > > > In other words, "if I drop a node, how much of a performance hit is my job > > going to take?" > > > > Also, for this type of timing/testing, I'd probably make sure speculative > > execution is off. It will likely throw some curve balls into the time. > _________________________________________________________________ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
