There is a slide show on nutch that would be much more clear. I mentioned it some time ago. If you go to the hadoop presentations page on the wiki ( http://wiki.apache.org/lucene-hadoop/HadoopPresentations) then you will be able to find one of the slide shows that goes through the nutch MR steps.
On 10/30/07 10:17 AM, "Jim the Standing Bear" <[EMAIL PROTECTED]> wrote: > Thanks for jumping in and giving me inputs, Ted. Yes, intuitively it > is an easy project (we had a conversation a few days back), except > when it comes to implementation, I am having trouble with the details. > > I tried to look at nutch's source code, but frankly it wasn't trivial. > I guess I will try again, with what you just said in the emails as a > guide. > > -- Jim > > On 10/30/07, Ted Dunning <[EMAIL PROTECTED]> wrote: >> >> >> When there are no new catalogs to examine, then the main code can exit. >> >> The easiest way to present this back to the controller is by using the >> counter capability. That what the controller can look at the results of a >> map-reduce step to determine how many new catalogs were found. >> >> You haven't hit a dead end. This is really a pretty simple program that is >> very similar to what nutch does all the time to crawl web sites. >> >> >> On 10/29/07 6:57 PM, "Jim the Standing Bear" <[EMAIL PROTECTED]> wrote: >> >>> Thanks, Stu... Maybe my mind is way off track - but I still sense a >>> problem with the mapper sending feedbacks to the job controller. That >>> is, when a mapper has reached the terminal condition, how can it tell >>> the job controller to stop? >>> >>> If I keep a JobConf object in the mapper, and set a property >>> "stop.processing" to true when a mapping task has reached the terminal >>> condition, will it cause synchronization problems? There could be >>> other mapping tasks that still wish to go on? >>> >>> I tried to find a way so that the job controller can open the file in >>> the output path at the end of the loop to read the contents; but thus >>> far, I haven't seen a way to achieve this. >>> >>> Does this mean I have hit a dead-end? >>> >>> -- Jim >>> >>> >>> >>> On 10/29/07, Stu Hood <[EMAIL PROTECTED]> wrote: >>>> The iteration would take place in your control code (your 'main' method, as >>>> shown in the examples). >>>> >>>> In order to prevent records from looping infinitely, each iteration would >>>> need to use a separate output/input directory. >>>> >>>> Thanks, >>>> Stu >>>> >>>> >>>> -----Original Message----- >>>> From: Jim the Standing Bear <[EMAIL PROTECTED]> >>>> Sent: Monday, October 29, 2007 5:45pm >>>> To: [email protected] >>>> Subject: Re: can jobs be launched recursively within a mapper ? >>>> >>>> thanks, Owen and David, >>>> >>>> I also thought of making a queue so that I can push catalog names to >>>> the end of it, while the job control loop keeps removing items off the >>>> queue until there is no more left. >>>> >>>> However, the problem is I don't see how I can do so within the >>>> map/reduce context. All the code examples are one-shot deals and >>>> there is no iteration involved. >>>> >>>> Furthermore, what David said made sense, but to avoid infinite loop, >>>> the code must remove the record it just read from the input file. How >>>> do I do that using hadoop's fs? or does hadoop take care of it >>>> automatically? >>>> >>>> -- Jim >>>> >>>> >>>> >>>> On 10/29/07, David Balatero <[EMAIL PROTECTED]> wrote: >>>>> Aren't these questions a little advanced for a bear to be asking? >>>>> I'll be here all night... >>>>> >>>>> But seriously, if your job is inherently recursive, one possible way >>>>> to do it would be to make sure that you output in the same format >>>>> that you input. Then you can keep re-reading the outputted file back >>>>> into a new map/reduce job, until you hit some base case and you >>>>> terminate. I've had a main method before that would kick off a bunch >>>>> of jobs in a row -- but I wouldn't really recommend starting another >>>>> map/reduce job in the scope of a running map() or reduce() method. >>>>> >>>>> - David >>>>> >>>>> >>>>> On Oct 29, 2007, at 2:17 PM, Jim the Standing Bear wrote: >>>>> >>>>>> then >>>>> >>>>> >>>> >>>> >>>> -- >>>> -------------------------------------- >>>> Standing Bear Has Spoken >>>> -------------------------------------- >>>> >>>> >>>> >>> >> >> >
