Thanks for jumping in and giving me inputs, Ted.  Yes, intuitively it
is an easy project (we had a conversation a few days back), except
when it comes to implementation, I am having trouble with the details.

I tried to look at nutch's source code, but frankly it wasn't trivial.
I guess I will try again, with what you just said in the emails as a
guide.

-- Jim

On 10/30/07, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
>
> When there are no new catalogs to examine, then the main code can exit.
>
> The easiest way to present this back to the controller is by using the
> counter capability.  That what the controller can look at the results of a
> map-reduce step to determine how many new catalogs were found.
>
> You haven't hit a dead end.  This is really a pretty simple program that is
> very similar to what nutch does all the time to crawl web sites.
>
>
> On 10/29/07 6:57 PM, "Jim the Standing Bear" <[EMAIL PROTECTED]> wrote:
>
> > Thanks, Stu...  Maybe my mind is way off track - but I still sense a
> > problem with the mapper sending feedbacks to the job controller.  That
> > is, when a mapper has reached the terminal condition, how can it tell
> > the job controller to stop?
> >
> > If I keep a JobConf object in the mapper, and set a property
> > "stop.processing" to true when a mapping task has reached the terminal
> > condition, will it cause synchronization problems?  There could be
> > other mapping tasks that still wish to go on?
> >
> > I tried to find a way so that the job controller can open the file in
> > the output path at the end of the loop to read the contents; but thus
> > far, I haven't seen a way to achieve this.
> >
> > Does this mean I have hit a dead-end?
> >
> > -- Jim
> >
> >
> >
> > On 10/29/07, Stu Hood <[EMAIL PROTECTED]> wrote:
> >> The iteration would take place in your control code (your 'main' method, as
> >> shown in the examples).
> >>
> >> In order to prevent records from looping infinitely, each iteration would
> >> need to use a separate output/input directory.
> >>
> >> Thanks,
> >> Stu
> >>
> >>
> >> -----Original Message-----
> >> From: Jim the Standing Bear <[EMAIL PROTECTED]>
> >> Sent: Monday, October 29, 2007 5:45pm
> >> To: [email protected]
> >> Subject: Re: can jobs be launched recursively within a mapper ?
> >>
> >> thanks, Owen and David,
> >>
> >> I also thought of making a queue so that I can push catalog names to
> >> the end of it, while the job control loop keeps removing items off the
> >> queue until there is no more left.
> >>
> >> However, the problem is I don't see how I can do so within the
> >> map/reduce context.  All the code examples are one-shot deals and
> >> there is no iteration involved.
> >>
> >> Furthermore, what David said made sense, but to avoid infinite loop,
> >> the code must remove the record it just read from the input file.  How
> >> do I do that using hadoop's fs?  or does hadoop take care of it
> >> automatically?
> >>
> >> -- Jim
> >>
> >>
> >>
> >> On 10/29/07, David Balatero <[EMAIL PROTECTED]> wrote:
> >>> Aren't these questions a little advanced for a bear to be asking?
> >>> I'll be here all night...
> >>>
> >>> But seriously, if your job is inherently recursive, one possible way
> >>> to do it would be to make sure that you output in the same format
> >>> that you input. Then you can keep re-reading the outputted file back
> >>> into a new map/reduce job, until you hit some base case and you
> >>> terminate. I've had a main method before that would kick off a bunch
> >>> of jobs in a row -- but I wouldn't really recommend starting another
> >>> map/reduce job in the scope of a running map() or reduce() method.
> >>>
> >>> - David
> >>>
> >>>
> >>> On Oct 29, 2007, at 2:17 PM, Jim the Standing Bear wrote:
> >>>
> >>>> then
> >>>
> >>>
> >>
> >>
> >> --
> >> --------------------------------------
> >> Standing Bear Has Spoken
> >> --------------------------------------
> >>
> >>
> >>
> >
>
>


-- 
--------------------------------------
Standing Bear Has Spoken
--------------------------------------

Reply via email to