at look at this good nutch doc

http://wiki.apache.org/nutch/DissectingTheNutchCrawler

Michael Ji

--- Daniele Menozzi <[EMAIL PROTECTED]> wrote:

> Hi all, I have questions regarding
> org.apache.nutch.tools.CrawlTool: I do
> not have really understood what is the ralationship
> between
> depth,segments,fetching..
> Take for example the tutorial, I understand theese 2
> steps:
> 
>       bin/nutch admin db -create
>       bin/nutch inject db -dmozfile content.rdf.u8
> -subset 3000
> 
> but, when I do this:
>       
>       bin/nutch generate db segments
> 
> what happens? I think that a dir called 'segments'
> id created, and inside
> of it I can find the links I have previously
> injected.Ok.Next steps:
>       
>       bin/nutch fetch $s1     
>       bin/nutch updatedb db $s1 
> 
> Ok, no problems here. 
> But now I cannot understood what happens with this
> command:
> 
>       bin/nutch generate db segments
> 
> it is the same command of above, but now I've not
> injected anything in the
> DB, it only contais the pages I've previously
> fetched.
> So, does it mean that when I generate a segment, it
> will automagically be
> filled with links found in fetched pages? And where
> theese links are saved?
> And who saves theese link?
> 
> Thank you so much, this work is really interesting!
>       Menoz
> 
> -- 
>                     Free Software Enthusiast
>                Debian Powered Linux User #332564 
>                    http://menoz.homelinux.org
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to