RE: Modeling WordCount in a different way

Sharad Agarwal Tue, 07 Apr 2009 22:53:25 -0700

> I have confusion how would I start the next job after finishing the one,
> could you just make it clear by some rough example.
See JobControl class to chain the jobs. You can specify dependencies as well. 
You can checkout the TestJobControl class  for example code.
>
> Also do I need to use
> SequenceFileInputFormat to maintain the results in the memory and then
> accessing it.
>
Not really. You have to use the corresponding reader to read the data. For 
example if you have written it using TextOutputFormat(default), you can then 
read it using TextInputFormat. The reader can be created in the reducer 
initialization code. In the new api (org.apache.hadoop.mapreduce.Reducer) it 
can be done in "setup" method. Here you can load the word,count mappings in a 
HashMap.
In case you don't want to load all data in memory, you can create the reader in 
"setup" method and keep on doing the next (LineRecordReader#nextKeyValue())  in 
the reduce function if the reduce key is greater than the current key from the 
reader.


- Sharad

RE: Modeling WordCount in a different way

Reply via email to