That makes sense, Samaneh. I was thinking about it more coarsely. As far as I know, currently there is no way to skip the sort phase - you would need to modify the code.
-Sandy On Thu, Apr 18, 2013 at 3:42 PM, Samaneh Shokuhi <samaneh.shok...@gmail.com>wrote: > Hi Sandy, > As i understood map task involves these phases.1) Map processing 2) spill > buffer contents to disk 3) partitioning 4) sorting 5) merging spill files > into single file > MM maybe i am wrong but i thought outputs are grouped in partitioning > phase and after that it will be sorted in sort phase before sending to > reducer. Is that what happens in mapper phase ? > > Regarding to your question ,actually I think sort phase is one of the time > consuming phase in mapper , what i am trying to do is to know how much > percentage of mapper time is spent on sort phase and investigate if it is > possible to skip sort in some cases.For example if we have only one reducer > is it possible to skip the sorting and just flush the data directly to the > reducer ? > > Samaneh > > > > On Thu, Apr 18, 2013 at 8:46 PM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > > > Hi Samaneh, > > > > If you want to see the map outputs post sort/shuffle, the easiest way is > > probably to use an IdentityReducer and inspect the job. > > > > Can you be more specific on what you need to disable the sort phase for? > > Sorting is used in part to group map outputs and route them to the > correct > > reducer. > > > > -Sandy > > > > > > On Thu, Apr 18, 2013 at 1:53 AM, Samaneh Shokuhi > > <samaneh.shok...@gmail.com>wrote: > > > > > Hello All, > > > I am doing some experiments with WordCount example running on hadoop > > > cluster. I have some questions : > > > > > > 1) How can i monitor the output from mapper before flushing to > reducer? ( > > > Infact i want to see how the keys are sorted.) > > > > > > 2) In one of my experiments i need to disable the sort phase in Mapper > > and > > > send unsorted data to reducer. Is there any way to disable this sort in > > > mapper ? or i need to modify hadoop to disable it ? > > > As i undestood in MapTask.java this functionality implemented. > > > And ofcourse i dont want to set number of reducer to zero becouse i > need > > to > > > have atleast one reducer. > > > > > > So any idea how to disable the sort phase in mapper and monitor the > > output > > > ? > > > > > > Best, > > > Samaneh > > > > > >