yes, you pointed me to the slides in a previous thread.  I looked at
it, but when I was reading nutch source code, it escaped my mind.
Thank you so much for reminding me again, Ted.

-- Jim

On 10/30/07, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
>
> There is a slide show on nutch that would be much more clear.  I mentioned
> it some time ago.  If you go to the hadoop presentations page on the wiki (
> http://wiki.apache.org/lucene-hadoop/HadoopPresentations) then you will be
> able to find one of the slide shows that goes through the nutch MR steps.
>
>
> On 10/30/07 10:17 AM, "Jim the Standing Bear" <[EMAIL PROTECTED]>
> wrote:
>
> > Thanks for jumping in and giving me inputs, Ted.  Yes, intuitively it
> > is an easy project (we had a conversation a few days back), except
> > when it comes to implementation, I am having trouble with the details.
> >
> > I tried to look at nutch's source code, but frankly it wasn't trivial.
> > I guess I will try again, with what you just said in the emails as a
> > guide.
> >
> > -- Jim
> >
> > On 10/30/07, Ted Dunning <[EMAIL PROTECTED]> wrote:
> >>
> >>
> >> When there are no new catalogs to examine, then the main code can exit.
> >>
> >> The easiest way to present this back to the controller is by using the
> >> counter capability.  That what the controller can look at the results of a
> >> map-reduce step to determine how many new catalogs were found.
> >>
> >> You haven't hit a dead end.  This is really a pretty simple program that is
> >> very similar to what nutch does all the time to crawl web sites.
> >>
> >>
> >> On 10/29/07 6:57 PM, "Jim the Standing Bear" <[EMAIL PROTECTED]> wrote:
> >>
> >>> Thanks, Stu...  Maybe my mind is way off track - but I still sense a
> >>> problem with the mapper sending feedbacks to the job controller.  That
> >>> is, when a mapper has reached the terminal condition, how can it tell
> >>> the job controller to stop?
> >>>
> >>> If I keep a JobConf object in the mapper, and set a property
> >>> "stop.processing" to true when a mapping task has reached the terminal
> >>> condition, will it cause synchronization problems?  There could be
> >>> other mapping tasks that still wish to go on?
> >>>
> >>> I tried to find a way so that the job controller can open the file in
> >>> the output path at the end of the loop to read the contents; but thus
> >>> far, I haven't seen a way to achieve this.
> >>>
> >>> Does this mean I have hit a dead-end?
> >>>
> >>> -- Jim
> >>>
> >>>
> >>>
> >>> On 10/29/07, Stu Hood <[EMAIL PROTECTED]> wrote:
> >>>> The iteration would take place in your control code (your 'main' method, 
> >>>> as
> >>>> shown in the examples).
> >>>>
> >>>> In order to prevent records from looping infinitely, each iteration would
> >>>> need to use a separate output/input directory.
> >>>>
> >>>> Thanks,
> >>>> Stu
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: Jim the Standing Bear <[EMAIL PROTECTED]>
> >>>> Sent: Monday, October 29, 2007 5:45pm
> >>>> To: [email protected]
> >>>> Subject: Re: can jobs be launched recursively within a mapper ?
> >>>>
> >>>> thanks, Owen and David,
> >>>>
> >>>> I also thought of making a queue so that I can push catalog names to
> >>>> the end of it, while the job control loop keeps removing items off the
> >>>> queue until there is no more left.
> >>>>
> >>>> However, the problem is I don't see how I can do so within the
> >>>> map/reduce context.  All the code examples are one-shot deals and
> >>>> there is no iteration involved.
> >>>>
> >>>> Furthermore, what David said made sense, but to avoid infinite loop,
> >>>> the code must remove the record it just read from the input file.  How
> >>>> do I do that using hadoop's fs?  or does hadoop take care of it
> >>>> automatically?
> >>>>
> >>>> -- Jim
> >>>>
> >>>>
> >>>>
> >>>> On 10/29/07, David Balatero <[EMAIL PROTECTED]> wrote:
> >>>>> Aren't these questions a little advanced for a bear to be asking?
> >>>>> I'll be here all night...
> >>>>>
> >>>>> But seriously, if your job is inherently recursive, one possible way
> >>>>> to do it would be to make sure that you output in the same format
> >>>>> that you input. Then you can keep re-reading the outputted file back
> >>>>> into a new map/reduce job, until you hit some base case and you
> >>>>> terminate. I've had a main method before that would kick off a bunch
> >>>>> of jobs in a row -- but I wouldn't really recommend starting another
> >>>>> map/reduce job in the scope of a running map() or reduce() method.
> >>>>>
> >>>>> - David
> >>>>>
> >>>>>
> >>>>> On Oct 29, 2007, at 2:17 PM, Jim the Standing Bear wrote:
> >>>>>
> >>>>>> then
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> --------------------------------------
> >>>> Standing Bear Has Spoken
> >>>> --------------------------------------
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>


-- 
--------------------------------------
Standing Bear Has Spoken
--------------------------------------

Reply via email to