Sudhakara,thanks again for your information. Actually the reason i am focused on response time is i am going to modify hadoop to skip the sort phase in mapTask and run a sample like wordCount example on modified hadoop (skipped sort in map task) and compare its performance with unmodified hadoop .In fact i need to know how sorting part affects on performance and if in any cases we can skip the sort part in map phase and get better performance . So to do this experiment i need a way to measure the performance .I wonder response time is a proper factor in this case to measure the performance or not.Do you suggest any way to measure the performance in this experiment?
Samaneh On Tue, Apr 9, 2013 at 5:43 PM, sudhakara st <sudhakara...@gmail.com> wrote: > Hi Samanesh, > > Increasing the reducer for a job would not help as you excepting. In most > of MR jobs more then 60% time will spent in mapper phase(it depends upon > what type of operation performing on data in map and reducer phase). > > Increasing the number of reduces increases the framework overhead, but > increases load balancing, available map-reduce slots allocation, system > resource utilization by considering job processes requirement we can > optimize the jobs for best performance with lowers the cost of failures. > > One more i cannot understand is why your so much worrying about response > time ?. The response time purely depends upon the how much data you are > processing in the job, what type of operation performing on the data, how > data distributed in the cluster and capacity of your cluster. A MR job > should says it is optimized it contains balanced number of mapper and > reducer. As per normal MR applications like word count i suggest to mapper > and reducer ratio 4:1(if your jobs running without combiner, In word count > like program with combiner defined, then i will suggest use 10:1 ) . > > While tuning the MR jobs we cannot consider only response time as parameter > to optimize the job, there so many other factors need consider, and > response time not only depends on number of reducer we configure for job, > it depends on numerous other factors as mentioned above. > > > > On Tue, Apr 9, 2013 at 2:05 PM, Samaneh Shokuhi > <samaneh.shok...@gmail.com>wrote: > > > Thanks Sudhakara for your reply. > > I did my experminets by varing number of reducers and made it double in > > each experiments .I have a qustion regarding to the response time.Suppose > > there is 6 cluster nodes and in first experminet i have 3 reducers and it > > gets doubled (6 ) in second experiment and in third one 12 .So what do > we > > expect to see in response time ? Should it get changed approximately like > > T,T/2,T/4,.. ?! > > What i get as response time is not changed like that, decreasion is like > > 2% or 3% .So i want to know by increasing the number of reducers how much > > decreasion normally we should get in response time ? > > > > Samaneh > > > > > > On Sun, Apr 7, 2013 at 7:53 PM, sudhakara st <sudhakara...@gmail.com> > > wrote: > > > > > Hi Samanesh, > > > > > > You can experiment with > > > 1. By varying number reducer(mapred.reduce.tasks) > > > > > > (Configure these parameters depends to you system capacity) . > > > mapred.tasktracker.map.tasks.maximum > > > mapred.tasktracker.reduce.tasks.maximum > > > > > > Tasktrackers have a fixed number of slots for map tasks and for reduce > > > tasks,The precise number depends on the number of cores and the amount > of > > > memory on the tasktracker nodes, for example,a a quad- core with8GM > > memory > > > may be able to run 3 map tasks and 2 reduce tasks (not precise, it > depend > > > what type job you are running) simultaneously. > > > > > > > > > The right number of reduces seems to be 0.95 or 1.75 * (nodes * > > > mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can > launch > > > immediately and start transferring map outputs as the maps finish. At > > 1.75 > > > the faster nodes will finish their first round of reduces and launch a > > > second round of reduces doing a much better job of load balancing. > > > > > > 2. These are some main job tuning factors in term cluster resource > > > utilization(CPU, memory,I/O, network) and response time. > > > A) io.sort.mb > > > io.sort.record.percent > > > io.sort.spill.percent > > > io.sort.factor > > > mapred.reduce.parallel.copies > > > > > > B) Compression of Mapper and reducer outputs > > > mapred.map.output.compression.codec > > > > > > C)Enabling/Disabling Speculative job execution > > > mapred.map.tasks.speculative.execution. > > > mapred.reduce.tasks.speculative.execution > > > > > > D) Enabling JVM reuse > > > mapred.job.reuse.jvm.num.tasks > > > > > > > > > On Sun, Apr 7, 2013 at 10:31 PM, Samaneh Shokuhi > > > <samaneh.shok...@gmail.com>wrote: > > > > > > > Thanks Sudhakara for your reply. > > > > So if number of mappers depends on the data size ,maybe the best way > to > > > do > > > > my experiments is to increase the number of reducers based on the > > number > > > of > > > > estimated blocks in data file.Actually i want to know how response > time > > > is > > > > changed by changing the number of mappers and reducers. > > > > Any idea about the way of doing this kind of experiment? > > > > > > > > Samaneh > > > > > > > > > > > > On Sun, Apr 7, 2013 at 6:29 PM, sudhakara st <sudhakara...@gmail.com > > > > > > wrote: > > > > > > > > > Hi Samaneh, > > > > > > > > > > The number of map tasks for a given job is driven by > the > > > > number > > > > > of input splits in the input data. ideally in default > configurations > > > > each > > > > > input split(for a block) a map task is spawned. So your 2.5G of > data > > > > > contains 44 blocks, therefore you jobs taking 44 map task. At > > minimum, > > > > with > > > > > FileInputFormat derivatives, job will have at least one map per > file > > > and > > > > > can have multiple maps per file if they extend beyond a single > > > block(file > > > > > size is more that block size). The *mapred.map.tasks* parameter is > > > just a > > > > > hint to the InputFormat for the number of maps. its does not have > any > > > > > effect if the number blocks in the input date more then specified > > > value. > > > > It > > > > > not possible to specify number mapper need run for a job. But it > > > possible > > > > > to explicitly specify number reduce can run for a job by using * > > > > > mapred.reduce.tasks* property. > > > > > > > > > > The replication factor in not related in any to number of mapper > and > > > > > reducer. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Apr 7, 2013 at 7:38 PM, Samaneh Shokuhi > > > > > <samaneh.shok...@gmail.com>wrote: > > > > > > > > > > > Hi All, > > > > > > I am doing some experiments by running WordCount example on > hadoop. > > > > > > I have a cluster with 7 nodes .I want to run WordCount example > with > > > > > > 3mappers and 3 reducers and compare the response time with > another > > > > > > experiments when number of mappers and reducers increased to 6 > and > > 12 > > > > and > > > > > > so on. > > > > > > For first experiment i set number of the mappers and reducer to 3 > > in > > > > > > wordCount example source code .and also set the number of > > > replications > > > > > to 3 > > > > > > in hadoop configurations.Also the maximum number of tasks per > node > > > is > > > > > set > > > > > > to 1 . > > > > > > But when i run the sample with a big data like 2.5 G ,i can see > 44 > > > map > > > > > > tasks and 3 reduce tasks are running !! > > > > > > > > > > > > What parameters do i need to set to have like (3Mappers,3 > > Reducers), > > > > > > (6M,6R) and (12M,12R) and as i mentioned i have a cluster with 1 > > > > namenode > > > > > > and 6 datanodes. > > > > > > Is number of replications related to the number of mappers and > > > reducers > > > > > ?! > > > > > > Regards, > > > > > > Samaneh > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Regards, > > > > > ..... Sudhakara.st > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Regards, > > > ..... Sudhakara.st > > > > > > > > > -- > > Regards, > ..... Sudhakara.st >