There is a slide show on nutch that would be much more clear.  I mentioned
it some time ago.  If you go to the hadoop presentations page on the wiki (
http://wiki.apache.org/lucene-hadoop/HadoopPresentations) then you will be
able to find one of the slide shows that goes through the nutch MR steps.


On 10/30/07 10:17 AM, "Jim the Standing Bear" <[EMAIL PROTECTED]>
wrote:

> Thanks for jumping in and giving me inputs, Ted.  Yes, intuitively it
> is an easy project (we had a conversation a few days back), except
> when it comes to implementation, I am having trouble with the details.
> 
> I tried to look at nutch's source code, but frankly it wasn't trivial.
> I guess I will try again, with what you just said in the emails as a
> guide.
> 
> -- Jim
> 
> On 10/30/07, Ted Dunning <[EMAIL PROTECTED]> wrote:
>> 
>> 
>> When there are no new catalogs to examine, then the main code can exit.
>> 
>> The easiest way to present this back to the controller is by using the
>> counter capability.  That what the controller can look at the results of a
>> map-reduce step to determine how many new catalogs were found.
>> 
>> You haven't hit a dead end.  This is really a pretty simple program that is
>> very similar to what nutch does all the time to crawl web sites.
>> 
>> 
>> On 10/29/07 6:57 PM, "Jim the Standing Bear" <[EMAIL PROTECTED]> wrote:
>> 
>>> Thanks, Stu...  Maybe my mind is way off track - but I still sense a
>>> problem with the mapper sending feedbacks to the job controller.  That
>>> is, when a mapper has reached the terminal condition, how can it tell
>>> the job controller to stop?
>>> 
>>> If I keep a JobConf object in the mapper, and set a property
>>> "stop.processing" to true when a mapping task has reached the terminal
>>> condition, will it cause synchronization problems?  There could be
>>> other mapping tasks that still wish to go on?
>>> 
>>> I tried to find a way so that the job controller can open the file in
>>> the output path at the end of the loop to read the contents; but thus
>>> far, I haven't seen a way to achieve this.
>>> 
>>> Does this mean I have hit a dead-end?
>>> 
>>> -- Jim
>>> 
>>> 
>>> 
>>> On 10/29/07, Stu Hood <[EMAIL PROTECTED]> wrote:
>>>> The iteration would take place in your control code (your 'main' method, as
>>>> shown in the examples).
>>>> 
>>>> In order to prevent records from looping infinitely, each iteration would
>>>> need to use a separate output/input directory.
>>>> 
>>>> Thanks,
>>>> Stu
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Jim the Standing Bear <[EMAIL PROTECTED]>
>>>> Sent: Monday, October 29, 2007 5:45pm
>>>> To: [email protected]
>>>> Subject: Re: can jobs be launched recursively within a mapper ?
>>>> 
>>>> thanks, Owen and David,
>>>> 
>>>> I also thought of making a queue so that I can push catalog names to
>>>> the end of it, while the job control loop keeps removing items off the
>>>> queue until there is no more left.
>>>> 
>>>> However, the problem is I don't see how I can do so within the
>>>> map/reduce context.  All the code examples are one-shot deals and
>>>> there is no iteration involved.
>>>> 
>>>> Furthermore, what David said made sense, but to avoid infinite loop,
>>>> the code must remove the record it just read from the input file.  How
>>>> do I do that using hadoop's fs?  or does hadoop take care of it
>>>> automatically?
>>>> 
>>>> -- Jim
>>>> 
>>>> 
>>>> 
>>>> On 10/29/07, David Balatero <[EMAIL PROTECTED]> wrote:
>>>>> Aren't these questions a little advanced for a bear to be asking?
>>>>> I'll be here all night...
>>>>> 
>>>>> But seriously, if your job is inherently recursive, one possible way
>>>>> to do it would be to make sure that you output in the same format
>>>>> that you input. Then you can keep re-reading the outputted file back
>>>>> into a new map/reduce job, until you hit some base case and you
>>>>> terminate. I've had a main method before that would kick off a bunch
>>>>> of jobs in a row -- but I wouldn't really recommend starting another
>>>>> map/reduce job in the scope of a running map() or reduce() method.
>>>>> 
>>>>> - David
>>>>> 
>>>>> 
>>>>> On Oct 29, 2007, at 2:17 PM, Jim the Standing Bear wrote:
>>>>> 
>>>>>> then
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> --------------------------------------
>>>> Standing Bear Has Spoken
>>>> --------------------------------------
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
> 

Reply via email to