Amogh, Thanks for the attachment. I'll hold on to it.
If I may press you a bit further, I noticed that the directory tree is different in the distribution I downloaded than the various paths I see in the the patch. It is different still in the svn trunk. What I want is to apply the patch to my hadoop 0.20.1 distribution. It doesn't just work because of this directory vs path business. I suppose I could hack on the patch but it seems I shouldn't have to. Why these three differences? release, trunk, patch? Am I using the wrong code base? On Mon, Dec 14, 2009 at 9:30 PM, Amogh Vasekar <am...@yahoo-inc.com> wrote: > Yes. Also attached is an old thread I have kept handy with me. Hope this > helps you. > > > Thanks, > Amogh > > > On 12/11/09 10:07 PM, "Geoffry Roberts" <geoffry.robe...@gmail.com> wrote: > > Amogh, > > I don't have experience with patches for hadoop. > > I take it that I apply this patch using the linux patch utility. > > I further assume, I need only apply the latest patch, which is 5. > > Am I correct. > > On Wed, Dec 9, 2009 at 7:30 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote: > > http://issues.apache.org/jira/browse/MAPREDUCE-370 > > You’ll have to work around for now / try to apply patch. > > Amogh > > > > On 12/9/09 8:54 PM, "Geoffry Roberts" <geoffry.robe...@gmail.com < > http://geoffry.robe...@gmail.com> > wrote: > > Aaron, > > I am using 0.20.1 and I'm not finding org.apache.hadoop.mapreduce. > > lib.output.MultipleOutputs. I'm using the download page where the tar ball > is dated from Sep.09. > > > > Sounds like I need to look at the code repository. > > > > On Tue, Dec 8, 2009 at 1:39 PM, Aaron Kimball <aa...@cloudera.com < > http://aa...@cloudera.com> > wrote: > > Geoffry, > > There are two MultipleOutputs implementations; one for the new API, one for > the old one. > > The new API (org.apache.hadoop.mapreduce.lib.output.MultipleOutputs) does > not have a getCollector() method. This is intended to work with > org.apache.hadoop.mapreduce.Mapper and its associated Context object. > > The old API implementation of MO > (org.apache.hadoop.mapred.lib.MultipleOutputs) is intended to work with > org.apache.hadoop.mapred.Mapper, Reporter, and friends. > > If you're going to use the new org.apache.hadoop.mapreduce-based code, you > should not need to import anything in the mapred package. That having been > said -- I just realized that the new-API-compatible MultipleOutputs > implementation is not in Hadoop 0.20. It's only in the unreleased 0.21. If > you're using 0.20, you should probably stick with the old API for your > process. > > Cheers, > - Aaron > > > On Tue, Dec 8, 2009 at 12:40 PM, Geoffry Roberts < > geoffry.robe...@gmail.com <http://geoffry.robe...@gmail.com> > wrote: > > All, > > This one has me stumped. > > What I want to do is output from my reducer multiple files, one for each > key value. I also want to avoid any deprecated parts of the API. > > As suggested, I switched from using MultipleTextOutputFormat to > MultipleOutputs but have run into an impasse. MultipleOutputs' getCollector > method requires a Reporter as a parameter, but as far as I can tell, the API > doesn't support this. The only reporter I can find is in the context > object, but is declared protected. > > Am I stuck? or just missing something? > > My code: > > @Override > public void reduce(Text key, Iterable<Text> values, Context context) > throws IOException { > String fileName = key.toString(); > MultipleOutputs.addNamedOutput((JobConf) > context.getConfiguration(), fileName, OutputFormat.class, Text.class, > Text.class); > mos = new MultipleOutputs((JobConf) > context.getConfiguration()); > for (Text line : values) { > > // This is the problem line: > mos.getCollector(fileName, <reporter goes here>).collect( > key, line); > } > > mos.close(); > > } > > On Mon, Oct 5, 2009 at 11:17 AM, Aaron Kimball <aa...@cloudera.com < > http://aa...@cloudera.com> > wrote: > > Geoffry, > > The new API comes with a related OF, called MultipleOutputs > (o.a.h.mapreduce.lib.output.MultipleOutputs). You may want to look into > using this instead. > > - Aaron > > > On Tue, Sep 29, 2009 at 4:44 PM, Geoffry Roberts < > geoffry.robe...@gmail.com <http://geoffry.robe...@gmail.com> > wrote: > > All, > > What I want to do is output from my reducer multiple files one for each key > value. > > Can this still be done in the current API? > > It seems that using MultipleTextOutputFormat requires one to use deprecated > parts of API. > > It this correct? > > I would like to use the class or its equivalent and stay off anything > deprecated. > > Is there a work around? > > In the current API one uses Job and a class derived from the > classorg.apache.hadoop.mapreduce.OutputFormat. > MultipleTextOutputFormat does not derive from this class. > > Job.setOutputFormatClass(Class<? extends org.apache.hadoop.mapreduce. > OutputFormat>); > > > In the Old, deprecated API, one uses JobConf and an implementation of the > interface org.apache.hadoop.mapred.OutputFormat. MultipleTextOutputFormat > is just such an implementation. > > JobConf.setOutputFormat(Class<? extends org.apache.hadoop.mapred . > OutputFormat); > > > > > > > > >