Hi Steps to do this: 1) Map: It will only define the key value for each number 2) Combiner : To sort locally over chunk of dataset . 3) Reducer: It will sort after over whole chunk globally--------------> OUT PUT as sorted
Note: set combiner and reducer as Same class. Example: Let us assume that our data set (integers) is constrained between 100 to 200 and we have 5 files each containing 1000 random integers between 100 and 200 (so a total of 5000 integers between 100 and 200). We read each file into a Map and then in the Reduce phase, we produce a final Map which contains the count of all the integers. Now if we sort all the integers from the final Map and output it into a list data structure in the form of <Integer, Count> then we have sorted all the data (see figure below). Aside : In Java, you don’t even have to come up with the data-structure that I am talking about, if you just use a TreeMap<http://java.sun.com/javase/6/docs/api/index.html?java/util/TreeMap.html>in the final Reduce phase, then all the keys (i.e. data) are already sorted as long as the key type (e.g. String, Integer, etc.) implements the Comparable<http://java.sun.com/javase/6/docs/api/index.html?java/lang/Comparable.html>interface ( Hadoop <http://hadoop.apache.org/> has something similar called WritableComparable<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/WritableComparable.html>and I am using a TreeMap that takes Strings as keys in Reducer<http://code.google.com/p/dalalstreet/source/browse/trunk/MapReduce/src/org/karticks/mapreduce/Reducer.java> Thanks Samir On Tue, May 15, 2012 at 11:31 PM, @dataElGrande <markydale...@gmail.com>wrote: > > Check out Pentaho's howto's when dealing with Hadoop or NoSQL or anything > big > data related. http://wiki.pentaho.com/display/BAD/How+To%27s > > > madhu_sushmi wrote: > > > > Hi, > > I need to implement distributed sorting using Hadoop. I am quite new to > > Hadoop and I am getting confused. If I want to implement Merge sort, what > > my Map and reduce should be doing. ? Should all the sorting happen at > > reduce side? > > > > Please help. This is an urgent requirement. Please guide me. > > > > Thanks, > > Madhu > > > > -- > View this message in context: > http://old.nabble.com/Hadoop---Distributed-sorting-tp32876784p33849704.html > Sent from the Hadoop core-dev mailing list archive at Nabble.com. > >