Hi Aayush So have been able to find solution for the Multi-Level Map/Reduce.I am also stuck on this problem and I cannot find a way out. Can you help me?
Thanks Aayush Garg wrote: > > Hi, > > I have not used lucene index ever before. I do not get how we build it > with > hadoop Map reduce. Basically what I was looking for like how to implement > multilevel map/reduce for my mentioned problem. > > > On Fri, Apr 4, 2008 at 7:23 PM, Ning Li <ning.li...@gmail.com> wrote: > >> You can build Lucene indexes using Hadoop Map/Reduce. See the index >> contrib package in the trunk. Or is it still not something you are >> looking for? >> >> Regards, >> Ning >> >> On 4/4/08, Aayush Garg <aayush.g...@gmail.com> wrote: >> > No, currently my requirement is to solve this problem by apache hadoop. >> I am >> > trying to build up this type of inverted index and then measure >> performance >> > criteria with respect to others. >> > >> > Thanks, >> > >> > >> > On Fri, Apr 4, 2008 at 5:54 PM, Ted Dunning <tdunn...@veoh.com> wrote: >> > >> > > >> > > Are you implementing this for instruction or production? >> > > >> > > If production, why not use Lucene? >> > > >> > > >> > > On 4/3/08 6:45 PM, "Aayush Garg" <aayush.g...@gmail.com> wrote: >> > > >> > > > HI Amar , Theodore, Arun, >> > > > >> > > > Thanks for your reply. Actaully I am new to hadoop so cant figure >> out >> > > much. >> > > > I have written following code for inverted index. This code maps >> each >> > > word >> > > > from the document to its document id. >> > > > ex: apple file1 file123 >> > > > Main functions of the code are:- >> > > > >> > > > public class HadoopProgram extends Configured implements Tool { >> > > > public static class MapClass extends MapReduceBase >> > > > implements Mapper<LongWritable, Text, Text, Text> { >> > > > >> > > > private final static IntWritable one = new IntWritable(1); >> > > > private Text word = new Text(); >> > > > private Text doc = new Text(); >> > > > private long numRecords=0; >> > > > private String inputFile; >> > > > >> > > > public void configure(JobConf job){ >> > > > System.out.println("Configure function is called"); >> > > > inputFile = job.get("map.input.file"); >> > > > System.out.println("In conf the input file is"+inputFile); >> > > > } >> > > > >> > > > >> > > > public void map(LongWritable key, Text value, >> > > > OutputCollector<Text, Text> output, >> > > > Reporter reporter) throws IOException { >> > > > String line = value.toString(); >> > > > StringTokenizer itr = new StringTokenizer(line); >> > > > doc.set(inputFile); >> > > > while (itr.hasMoreTokens()) { >> > > > word.set(itr.nextToken()); >> > > > output.collect(word,doc); >> > > > } >> > > > if(++numRecords%4==0){ >> > > > System.out.println("Finished processing of input >> > > file"+inputFile); >> > > > } >> > > > } >> > > > } >> > > > >> > > > /** >> > > > * A reducer class that just emits the sum of the input values. >> > > > */ >> > > > public static class Reduce extends MapReduceBase >> > > > implements Reducer<Text, Text, Text, DocIDs> { >> > > > >> > > > // This works as K2, V2, K3, V3 >> > > > public void reduce(Text key, Iterator<Text> values, >> > > > OutputCollector<Text, DocIDs> output, >> > > > Reporter reporter) throws IOException { >> > > > int sum = 0; >> > > > Text dummy = new Text(); >> > > > ArrayList<String> IDs = new ArrayList<String>(); >> > > > String str; >> > > > >> > > > while (values.hasNext()) { >> > > > dummy = values.next(); >> > > > str = dummy.toString(); >> > > > IDs.add(str); >> > > > } >> > > > DocIDs dc = new DocIDs(); >> > > > dc.setListdocs(IDs); >> > > > output.collect(key,dc); >> > > > } >> > > > } >> > > > >> > > > public int run(String[] args) throws Exception { >> > > > System.out.println("Run function is called"); >> > > > JobConf conf = new JobConf(getConf(), WordCount.class); >> > > > conf.setJobName("wordcount"); >> > > > >> > > > // the keys are words (strings) >> > > > conf.setOutputKeyClass(Text.class); >> > > > >> > > > conf.setOutputValueClass(Text.class); >> > > > >> > > > >> > > > conf.setMapperClass(MapClass.class); >> > > > >> > > > conf.setReducerClass(Reduce.class); >> > > > } >> > > > >> > > > >> > > > Now I am getting output array from the reducer as:- >> > > > word \root\test\test123, \root\test12 >> > > > >> > > > In the next stage I want to stop 'stop words', scrub words etc. >> and >> > > like >> > > > position of the word in the document. How would I apply multiple >> maps or >> > > > multilevel map reduce jobs programmatically? I guess I need to make >> > > another >> > > > class or add some functions in it? I am not able to figure it out. >> > > > Any pointers for these type of problems? >> > > > >> > > > Thanks, >> > > > Aayush >> > > > >> > > > >> > > > On Thu, Mar 27, 2008 at 6:14 AM, Amar Kamat <ama...@yahoo-inc.com> >> > > wrote: >> > > > >> > > >> On Wed, 26 Mar 2008, Aayush Garg wrote: >> > > >> >> > > >>> HI, >> > > >>> I am developing the simple inverted index program frm the hadoop. >> My >> > > map >> > > >>> function has the output: >> > > >>> <word, doc> >> > > >>> and the reducer has: >> > > >>> <word, list(docs)> >> > > >>> >> > > >>> Now I want to use one more mapreduce to remove stop and scrub >> words >> > > from >> > > >> Use distributed cache as Arun mentioned. >> > > >>> this output. Also in the next stage I would like to have short >> summay >> > > >> Whether to use a separate MR job depends on what exactly you mean >> by >> > > >> summary. If its like a window around the current word then you can >> > > >> possibly do it in one go. >> > > >> Amar >> > > >>> associated with every word. How should I design my program from >> this >> > > >> stage? >> > > >>> I mean how would I apply multiple mapreduce to this? What would >> be >> the >> > > >>> better way to perform this? >> > > >>> >> > > >>> Thanks, >> > > >>> >> > > >>> Regards, >> > > >>> - >> > > >>> >> > > >>> >> > > >> >> > > >> > > >> > >> > >> > -- >> > Aayush Garg, >> > Phone: +41 76 482 240 >> > >> > > > > -- > Aayush Garg, > Phone: +41 76 482 240 > > -- View this message in context: http://old.nabble.com/Hadoop%3A-Multiple-map-reduce-or-some-better-way-tp16309172p34009971.html Sent from the Hadoop core-user mailing list archive at Nabble.com.