Re: Hadoop: Multiple map reduce or some better way

Nikit Saraf Wed, 13 Jun 2012 23:13:50 -0700

Hi Aayush

So have been able to find solution for the Multi-Level Map/Reduce.I am also
stuck on  this problem and I cannot find a way out. Can you help me?


Thanks


Aayush Garg wrote:
> 
> Hi,
> 
> I have not used lucene index ever before. I do not get how we build it
> with
> hadoop Map reduce. Basically what I was looking for like how to implement
> multilevel map/reduce for my mentioned problem.
> 
> 
> On Fri, Apr 4, 2008 at 7:23 PM, Ning Li <ning.li...@gmail.com> wrote:
> 
>> You can build Lucene indexes using Hadoop Map/Reduce. See the index
>> contrib package in the trunk. Or is it still not something you are
>> looking for?
>>
>> Regards,
>> Ning
>>
>> On 4/4/08, Aayush Garg <aayush.g...@gmail.com> wrote:
>> > No, currently my requirement is to solve this problem by apache hadoop.
>> I am
>> > trying to build up this type of inverted index and then measure
>> performance
>> > criteria with respect to others.
>> >
>> > Thanks,
>> >
>> >
>> > On Fri, Apr 4, 2008 at 5:54 PM, Ted Dunning <tdunn...@veoh.com> wrote:
>> >
>> > >
>> > > Are you implementing this for instruction or production?
>> > >
>> > > If production, why not use Lucene?
>> > >
>> > >
>> > > On 4/3/08 6:45 PM, "Aayush Garg" <aayush.g...@gmail.com> wrote:
>> > >
>> > > > HI  Amar , Theodore, Arun,
>> > > >
>> > > > Thanks for your reply. Actaully I am new to hadoop so cant figure
>> out
>> > > much.
>> > > > I have written following code for inverted index. This code maps
>> each
>> > > word
>> > > > from the document to its document id.
>> > > > ex: apple file1 file123
>> > > > Main functions of the code are:-
>> > > >
>> > > > public class HadoopProgram extends Configured implements Tool {
>> > > > public static class MapClass extends MapReduceBase
>> > > >     implements Mapper<LongWritable, Text, Text, Text> {
>> > > >
>> > > >     private final static IntWritable one = new IntWritable(1);
>> > > >     private Text word = new Text();
>> > > >     private Text doc = new Text();
>> > > >     private long numRecords=0;
>> > > >     private String inputFile;
>> > > >
>> > > >    public void configure(JobConf job){
>> > > >         System.out.println("Configure function is called");
>> > > >         inputFile = job.get("map.input.file");
>> > > >         System.out.println("In conf the input file is"+inputFile);
>> > > >     }
>> > > >
>> > > >
>> > > >     public void map(LongWritable key, Text value,
>> > > >                     OutputCollector<Text, Text> output,
>> > > >                     Reporter reporter) throws IOException {
>> > > >       String line = value.toString();
>> > > >       StringTokenizer itr = new StringTokenizer(line);
>> > > >       doc.set(inputFile);
>> > > >       while (itr.hasMoreTokens()) {
>> > > >         word.set(itr.nextToken());
>> > > >         output.collect(word,doc);
>> > > >       }
>> > > >       if(++numRecords%4==0){
>> > > >        System.out.println("Finished processing of input
>> > > file"+inputFile);
>> > > >      }
>> > > >     }
>> > > >   }
>> > > >
>> > > >   /**
>> > > >    * A reducer class that just emits the sum of the input values.
>> > > >    */
>> > > >   public static class Reduce extends MapReduceBase
>> > > >     implements Reducer<Text, Text, Text, DocIDs> {
>> > > >
>> > > >   // This works as K2, V2, K3, V3
>> > > >     public void reduce(Text key, Iterator<Text> values,
>> > > >                        OutputCollector<Text, DocIDs> output,
>> > > >                        Reporter reporter) throws IOException {
>> > > >       int sum = 0;
>> > > >       Text dummy = new Text();
>> > > >       ArrayList<String> IDs = new ArrayList<String>();
>> > > >       String str;
>> > > >
>> > > >       while (values.hasNext()) {
>> > > >          dummy = values.next();
>> > > >          str = dummy.toString();
>> > > >          IDs.add(str);
>> > > >        }
>> > > >        DocIDs dc = new DocIDs();
>> > > >        dc.setListdocs(IDs);
>> > > >       output.collect(key,dc);
>> > > >     }
>> > > >   }
>> > > >
>> > > >  public int run(String[] args) throws Exception {
>> > > >   System.out.println("Run function is called");
>> > > >     JobConf conf = new JobConf(getConf(), WordCount.class);
>> > > >     conf.setJobName("wordcount");
>> > > >
>> > > >     // the keys are words (strings)
>> > > >     conf.setOutputKeyClass(Text.class);
>> > > >
>> > > >     conf.setOutputValueClass(Text.class);
>> > > >
>> > > >
>> > > >     conf.setMapperClass(MapClass.class);
>> > > >
>> > > >     conf.setReducerClass(Reduce.class);
>> > > > }
>> > > >
>> > > >
>> > > > Now I am getting output array from the reducer as:-
>> > > > word \root\test\test123, \root\test12
>> > > >
>> > > > In the next stage I want to stop 'stop  words',  scrub words etc.
>> and
>> > > like
>> > > > position of the word in the document. How would I apply multiple
>> maps or
>> > > > multilevel map reduce jobs programmatically? I guess I need to make
>> > > another
>> > > > class or add some functions in it? I am not able to figure it out.
>> > > > Any pointers for these type of problems?
>> > > >
>> > > > Thanks,
>> > > > Aayush
>> > > >
>> > > >
>> > > > On Thu, Mar 27, 2008 at 6:14 AM, Amar Kamat <ama...@yahoo-inc.com>
>> > > wrote:
>> > > >
>> > > >> On Wed, 26 Mar 2008, Aayush Garg wrote:
>> > > >>
>> > > >>> HI,
>> > > >>> I am developing the simple inverted index program frm the hadoop.
>> My
>> > > map
>> > > >>> function has the output:
>> > > >>> <word, doc>
>> > > >>> and the reducer has:
>> > > >>> <word, list(docs)>
>> > > >>>
>> > > >>> Now I want to use one more mapreduce to remove stop and scrub
>> words
>> > > from
>> > > >> Use distributed cache as Arun mentioned.
>> > > >>> this output. Also in the next stage I would like to have short
>> summay
>> > > >> Whether to use a separate MR job depends on what exactly you mean
>> by
>> > > >> summary. If its like a window around the current word then you can
>> > > >> possibly do it in one go.
>> > > >> Amar
>> > > >>> associated with every word. How should I design my program from
>> this
>> > > >> stage?
>> > > >>> I mean how would I apply multiple mapreduce to this? What would
>> be
>> the
>> > > >>> better way to perform this?
>> > > >>>
>> > > >>> Thanks,
>> > > >>>
>> > > >>> Regards,
>> > > >>> -
>> > > >>>
>> > > >>>
>> > > >>
>> > >
>> > >
>> >
>> >
>> > --
>> > Aayush Garg,
>> > Phone: +41 76 482 240
>> >
>>
> 
> 
> 
> -- 
> Aayush Garg,
> Phone: +41 76 482 240
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Hadoop%3A-Multiple-map-reduce-or-some-better-way-tp16309172p34009971.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Hadoop: Multiple map reduce or some better way

Reply via email to