Or you can put the further processing in another Map-Reduce Job, make the whole a map-reduce jobs chain.
Jeff Zhang On Fri, Nov 27, 2009 at 10:38 PM, Jeff Zhang <[email protected]> wrote: > > Hi Liu, > > The reducer task is in an individual JVM, you have to put your modules into > reducer task if you really want to access the output in memory. > > I am not sure the size of your output, if it's not large, I suggest put > them in a message, and wrap your modules into a listener, and then send the > message to this listener for further processing. > > If the size of your output is large, I suggest you store them in hdfs, and > put the location in a message and send the message to the listener. > Because you said your modules are complicated, so I suggest you separate > them with the map-reduce jobs as I mentioned above , it will increase the > maintainability and extensibility of your system. > > > Jeff Zhang > > > > > On Fri, Nov 27, 2009 at 9:45 PM, Liu Xianglong <[email protected]>wrote: > >> Hi, Jeff. Thanks for you reply. Actually, I will do further process of the >> map-reduce output. If I cannot store them in memory, other modules cannot >> process them. So if these modules are integrated into map-reduce, then they >> will finish the process in mapreduce jobs. The problem is that these modules >> are complicated. The easy way is to store output of jobs in memory. What do >> you think? Do you have such experiences? >> >> >> -------------------------------------------------- >> From: "Jeff Zhang" <[email protected]> >> Sent: Friday, November 27, 2009 10:46 PM >> >> To: <[email protected]> >> Subject: Re: Store mapreduce output into my own data structures >> >> So how do you plan to integrate your other modules with hadoop ? >>> >>> Put them in reduce phase ? >>> >>> >>> Jeff Zhang >>> >>> >>> >>> On Fri, Nov 27, 2009 at 6:37 AM, <[email protected]> wrote: >>> >>> Actually I want the output can be used by other modules. So it has to >>>> read >>>> the output from hdfs files? Or integrate these modules into map-reduce? >>>> Is >>>> there other ways? >>>> >>>> -------------------------------------------------- >>>> From: "Jeff Zhang" <[email protected]> >>>> Sent: Friday, November 27, 2009 10:00 PM >>>> To: <[email protected]> >>>> Subject: Re: Store mapreduce output into my own data structures >>>> >>>> >>>> Hi Liu, >>>> >>>>> >>>>> Why you want to store the output in memory? You can not use the output >>>>> out >>>>> of reducer. >>>>> Actually at the beginning the output of reducer is in memory, and the >>>>> OutputFormat write these data to file system or other data store. >>>>> >>>>> >>>>> Jeff Zhang >>>>> >>>>> >>>>> >>>>> 2009/11/27 Liu Xianglong <[email protected]> >>>>> >>>>> Hi, everyone. Is there someone who uses map-reduce to store the reduce >>>>> >>>>>> output in memory. I mean, now the output path of job is set and reduce >>>>>> outputs are stored into files under this path.(see the comments along >>>>>> with >>>>>> the following codes) >>>>>> job.setOutputFormatClass(MyOutputFormat.class); >>>>>> //can I implement my OutputFormat to store these output key-value >>>>>> pairs >>>>>> in my data structures, or are these other ways to do it? >>>>>> job.setOutputKeyClass(ImmutableBytesWritable.class); >>>>>> job.setOutputValueClass(Result.class); >>>>>> FileOutputFormat.setOutputPath(job, outputDir); >>>>>> >>>>>> Is there any way to store them in some variables or data structures? >>>>>> Then >>>>>> how can I implement my OutputFormat? Any suggestions and codes are >>>>>> welcomed. >>>>>> >>>>>> Another question: is there some way to set the number of map task? It >>>>>> seems >>>>>> there is no API to do this in hadoop new job APIs. I am not sure the >>>>>> way >>>>>> to >>>>>> set this number. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Best Wishes! >>>>>> _____________________________________________________________ >>>>>> >>>>>> 刘祥龙 Liu Xianglong >>>>>> >>>>>> >>>>>> >>>>> >>> >
