I have confusion how would I start the next job after finishing the one, could you just make it clear by some rough example. Also do I need to use SequenceFileInputFormat to maintain the results in the memory and then accessing it.
On Tue, Apr 7, 2009 at 10:43 AM, Sharad Agarwal <shara...@yahoo-inc.com>wrote: > > > > Suppose a batch of inputsplits arrive in the beginning to every map, and > > reduce gives the word, frequency for this batch of inputsplits. > > Now after this another batch of inputsplits arrive and the results from > > subsequent reduce are aggregated to the previous results(if the word > "that" > > has frequency 2 in previous processing and in this processing it occurs 1 > > time, then the frequency of "that" is now maintained as 3). > > In next map-reduce "that" comes 4 times, now its frequency maintained as > > 7.... > > > you could merge the result from the previous step in the reducer. If the no > of unique words are not large, the output from the previous step can be > loaded in the memory hash. This can be used to add the count from previous > step to the current step. > In case you expect the unique words list to be large to fit in memory. You > could read the previous step output directly from the hdfs and since it > would be a sorted file you could just walk it and merge the count in single > pass in the reduce function. > > - Sharad > -- Aayush Garg, Phone: +41 764822440