Hey Eyal,

Actually, in the mode you call "command mode" there is no depth value.

To be more specific, the depth value is not "folder depth" it means the number 
of times the crawler would run from the basic seeds you entered to it. So for 
example if you put into your seeds 1 url to www.sample.com and in the crawl 
mode you set the "depth" to 3 than the crawler would run 3 times where each 
time the urls found during the previous crawl would be crawld. In the last 
stages of the crawl after the crawling stage is done the data would be 

So, in the "command mode" to achieve this you would need to write a small bash 
script which would copy that behavior which is:

For the number of depth
NewSegment = Nutch generate # generate the list of url to fetch
Nutch fetch NewSegment # fetch list of URLs
Nutch updatedb NewSegment # update the status of crawled links and add new 
found links.


Gal Nitzan.

> -----Original Message-----
> From: eyal edri [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 30, 2007 10:49 AM
> To: nutch-agent@lucene.apache.org
> Subject: depth arg in non crawl mode (fetch)
> Hello,
> I'm testing nutch 0.9 in the "Whole-Web" approach where i use a set of
> command to run the engine instead of just runing "crawl".
> i.e. nutch inject
>      nutch genrate
>      nutch fetch
>      nutch updatedb.. and so on.
> My question is, where can i define the depth arg (same one that appears in
> the crawl mode), in the broken ('whole web') mode?
> thanks,
> --
> Eyal Edri

Reply via email to