at look at this good nutch doc http://wiki.apache.org/nutch/DissectingTheNutchCrawler
Michael Ji --- Daniele Menozzi <[EMAIL PROTECTED]> wrote: > Hi all, I have questions regarding > org.apache.nutch.tools.CrawlTool: I do > not have really understood what is the ralationship > between > depth,segments,fetching.. > Take for example the tutorial, I understand theese 2 > steps: > > bin/nutch admin db -create > bin/nutch inject db -dmozfile content.rdf.u8 > -subset 3000 > > but, when I do this: > > bin/nutch generate db segments > > what happens? I think that a dir called 'segments' > id created, and inside > of it I can find the links I have previously > injected.Ok.Next steps: > > bin/nutch fetch $s1 > bin/nutch updatedb db $s1 > > Ok, no problems here. > But now I cannot understood what happens with this > command: > > bin/nutch generate db segments > > it is the same command of above, but now I've not > injected anything in the > DB, it only contais the pages I've previously > fetched. > So, does it mean that when I generate a segment, it > will automagically be > filled with links found in fetched pages? And where > theese links are saved? > And who saves theese link? > > Thank you so much, this work is really interesting! > Menoz > > -- > Free Software Enthusiast > Debian Powered Linux User #332564 > http://menoz.homelinux.org > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
