On 06/18/2012 10:40 AM, Mark Kerzner wrote:
that sounds very interesting, and I may implement such a workflow, but can I write back to HDFS in the mapper? In the reducer it is a standard context.write(), but it is a different context.
Both Mapper.Context and Reducer.Context descend from TaskInputOutputContext, which is where the write() method is defined, so they're both outputting their data in the same way.
If you don't have a Reducer -- only Mappers and fully parallel data processing -- then when you configure your job you set the number of reducers to zero. Then the mapper context knows that mapper output is the last step, so it uses the specified OutputFormat to write out the data, just like your reducer context currently does with reducer output.