Hi, NguyenHuynh If you need a sensitive solution, there is an article giving following operations to compute number of maps/reduces: Number of maps : max(min(block_size, data/#maps), min_split_size) Number of reduces : 0.95*num_nodes*mapred.tasktracker.reduce.tasks.maximum
If you need an "understandable" solution, try the following link: http://wiki.apache.org/hadoop/HowManyMapsAndReduces If you need just a rough estimation: Maps: The right level of parallelism for maps seems to be around 10-100 maps per-node. There is also another calculation (I find it worse than others), I couldn't find now. It's something like: Maps : An even number closest to several times bigger than number of task trackers. Reduces: Same as number of task trackers. Are you using namenode and jobtracker as tasktrackers, too? If you have difficulties, please inform.. Rasit 2009/2/16 nguyenhuynh <[email protected]>: > Hi all!, > > > I have 3 machines use to run Hadoop/hbase map-reduce. I don't known set > value for number map tasks and reduces. > > > How many number of task and reduce in this case? > > > Please, help me! > > Thanks, > > > Regards, > > NguyenHuynh > > > -- M. Raşit ÖZDAŞ
