Aayush, out of curiosity, why do you want model wordcount this way? What benefit do you see?
Norbert On 4/6/09, Aayush Garg <aayush.g...@gmail.com> wrote: > Hi, > > I want to make experiments with wordcount example in a different way. > > Suppose we have very large data. Instead of splitting all the data one time, > we want to feed some splits in the map-reduce job at a time. I want to model > the hadoop job like this, > > Suppose a batch of inputsplits arrive in the beginning to every map, and > reduce gives the word, frequency for this batch of inputsplits. > Now after this another batch of inputsplits arrive and the results from > subsequent reduce are aggregated to the previous results(if the word "that" > has frequency 2 in previous processing and in this processing it occurs 1 > time, then the frequency of "that" is now maintained as 3). > In next map-reduce "that" comes 4 times, now its frequency maintained as > 7.... > > And this process goes on like this. > Now how would I model inputsplits like this and how these continuous > map-reduces can be made running. In what way should I keep the results of > Map-Reduces so that I could aggregate this with the output of next > Map-reduce. > > Thanks, > > Aayush >