Thank you for the great instructions! Mark
On Mon, Jun 18, 2012 at 9:53 AM, John Armstrong <j...@ccri.com> wrote: > On 06/18/2012 10:40 AM, Mark Kerzner wrote: > >> that sounds very interesting, and I may implement such a workflow, but >> can I write back to HDFS in the mapper? In the reducer it is a standard >> context.write(), but it is a different context. >> > > Both Mapper.Context and Reducer.Context descend from > TaskInputOutputContext, which is where the write() method is defined, so > they're both outputting their data in the same way. > > If you don't have a Reducer -- only Mappers and fully parallel data > processing -- then when you configure your job you set the number of > reducers to zero. Then the mapper context knows that mapper output is the > last step, so it uses the specified OutputFormat to write out the data, > just like your reducer context currently does with reducer output. >