Re: can jobs be launched recursively within a mapper ?

Ted Dunning Tue, 30 Oct 2007 09:03:07 -0800


When there are no new catalogs to examine, then the main code can exit.


The easiest way to present this back to the controller is by using the
counter capability.  That what the controller can look at the results of a
map-reduce step to determine how many new catalogs were found.

You haven't hit a dead end.  This is really a pretty simple program that is
very similar to what nutch does all the time to crawl web sites.


On 10/29/07 6:57 PM, "Jim the Standing Bear" <[EMAIL PROTECTED]> wrote:

> Thanks, Stu...  Maybe my mind is way off track - but I still sense a
> problem with the mapper sending feedbacks to the job controller.  That
> is, when a mapper has reached the terminal condition, how can it tell
> the job controller to stop?
> 
> If I keep a JobConf object in the mapper, and set a property
> "stop.processing" to true when a mapping task has reached the terminal
> condition, will it cause synchronization problems?  There could be
> other mapping tasks that still wish to go on?
> 
> I tried to find a way so that the job controller can open the file in
> the output path at the end of the loop to read the contents; but thus
> far, I haven't seen a way to achieve this.
> 
> Does this mean I have hit a dead-end?
> 
> -- Jim
> 
> 
> 
> On 10/29/07, Stu Hood <[EMAIL PROTECTED]> wrote:
>> The iteration would take place in your control code (your 'main' method, as
>> shown in the examples).
>> 
>> In order to prevent records from looping infinitely, each iteration would
>> need to use a separate output/input directory.
>> 
>> Thanks,
>> Stu
>> 
>> 
>> -----Original Message-----
>> From: Jim the Standing Bear <[EMAIL PROTECTED]>
>> Sent: Monday, October 29, 2007 5:45pm
>> To: [email protected]
>> Subject: Re: can jobs be launched recursively within a mapper ?
>> 
>> thanks, Owen and David,
>> 
>> I also thought of making a queue so that I can push catalog names to
>> the end of it, while the job control loop keeps removing items off the
>> queue until there is no more left.
>> 
>> However, the problem is I don't see how I can do so within the
>> map/reduce context.  All the code examples are one-shot deals and
>> there is no iteration involved.
>> 
>> Furthermore, what David said made sense, but to avoid infinite loop,
>> the code must remove the record it just read from the input file.  How
>> do I do that using hadoop's fs?  or does hadoop take care of it
>> automatically?
>> 
>> -- Jim
>> 
>> 
>> 
>> On 10/29/07, David Balatero <[EMAIL PROTECTED]> wrote:
>>> Aren't these questions a little advanced for a bear to be asking?
>>> I'll be here all night...
>>> 
>>> But seriously, if your job is inherently recursive, one possible way
>>> to do it would be to make sure that you output in the same format
>>> that you input. Then you can keep re-reading the outputted file back
>>> into a new map/reduce job, until you hit some base case and you
>>> terminate. I've had a main method before that would kick off a bunch
>>> of jobs in a row -- but I wouldn't really recommend starting another
>>> map/reduce job in the scope of a running map() or reduce() method.
>>> 
>>> - David
>>> 
>>> 
>>> On Oct 29, 2007, at 2:17 PM, Jim the Standing Bear wrote:
>>> 
>>>> then
>>> 
>>> 
>> 
>> 
>> --
>> --------------------------------------
>> Standing Bear Has Spoken
>> --------------------------------------
>> 
>> 
>> 
>

Re: can jobs be launched recursively within a mapper ?

Reply via email to