Aaron, I am using 0.20.1 and I'm not finding org.apache.hadoop.mapreduce. > > lib.output.MultipleOutputs. I'm using the download page where the tar ball > is dated from Sep.09.
> Sounds like I need to look at the code repository. > On Tue, Dec 8, 2009 at 1:39 PM, Aaron Kimball <aa...@cloudera.com> wrote: > Geoffry, > > There are two MultipleOutputs implementations; one for the new API, one for > the old one. > > The new API (org.apache.hadoop.mapreduce.lib.output.MultipleOutputs) does > not have a getCollector() method. This is intended to work with > org.apache.hadoop.mapreduce.Mapper and its associated Context object. > > The old API implementation of MO > (org.apache.hadoop.mapred.lib.MultipleOutputs) is intended to work with > org.apache.hadoop.mapred.Mapper, Reporter, and friends. > > If you're going to use the new org.apache.hadoop.mapreduce-based code, you > should not need to import anything in the mapred package. That having been > said -- I just realized that the new-API-compatible MultipleOutputs > implementation is not in Hadoop 0.20. It's only in the unreleased 0.21. If > you're using 0.20, you should probably stick with the old API for your > process. > > Cheers, > - Aaron > > > On Tue, Dec 8, 2009 at 12:40 PM, Geoffry Roberts < > geoffry.robe...@gmail.com> wrote: > >> All, >> >> This one has me stumped. >> >> What I want to do is output from my reducer multiple files, one for each >> key value. I also want to avoid any deprecated parts of the API. >> >> As suggested, I switched from using MultipleTextOutputFormat to >> MultipleOutputs but have run into an impasse. MultipleOutputs' getCollector >> method requires a Reporter as a parameter, but as far as I can tell, the API >> doesn't support this. The only reporter I can find is in the context >> object, but is declared protected. >> >> Am I stuck? or just missing something? >> >> My code: >> >> @Override >> public void reduce(Text key, Iterable<Text> values, Context context) >> throws IOException { >> String fileName = key.toString(); >> MultipleOutputs.addNamedOutput((JobConf) >> context.getConfiguration(), fileName, OutputFormat.class, Text.class, >> Text.class); >> mos = new MultipleOutputs((JobConf) >> context.getConfiguration()); >> for (Text line : values) { >> >> // This is the problem line: >> mos.getCollector(fileName, <reporter goes here>).collect( >> key, line); >> } >> >> mos.close(); >> >> } >> >> On Mon, Oct 5, 2009 at 11:17 AM, Aaron Kimball <aa...@cloudera.com>wrote: >> >>> Geoffry, >>> >>> The new API comes with a related OF, called MultipleOutputs >>> (o.a.h.mapreduce.lib.output.MultipleOutputs). You may want to look into >>> using this instead. >>> >>> - Aaron >>> >>> >>> On Tue, Sep 29, 2009 at 4:44 PM, Geoffry Roberts < >>> geoffry.robe...@gmail.com> wrote: >>> >>>> All, >>>> >>>> What I want to do is output from my reducer multiple files one for each >>>> key value. >>>> >>>> Can this still be done in the current API? >>>> >>>> It seems that using MultipleTextOutputFormat requires one to use >>>> deprecated parts of API. >>>> >>>> It this correct? >>>> >>>> I would like to use the class or its equivalent and stay off anything >>>> deprecated. >>>> >>>> Is there a work around? >>>> >>>> In the current API one uses Job and a class derived from the >>>> classorg.apache.hadoop.mapreduce.OutputFormat. >>>> MultipleTextOutputFormat does not derive from this class. >>>> >>>> Job.setOutputFormatClass(Class<? extends org.apache.hadoop.mapreduce. >>>> OutputFormat>); >>>> >>>> >>>> In the Old, deprecated API, one uses JobConf and an implementation of >>>> the interface org.apache.hadoop.mapred.OutputFormat. >>>> MultipleTextOutputFormat is just such an implementation. >>>> >>>> JobConf.setOutputFormat(Class<? extends org.apache.hadoop.mapred . >>>> OutputFormat); >>>> >>> >>> >> >