bin/nutch updatedb db $s1 command updates WebDB with links you fetched in segment $s1. Regards Piotr
Daniele Menozzi wrote:
Hi all, I have questions regarding org.apache.nutch.tools.CrawlTool: I do not have really understood what is the ralationship between depth,segments,fetching.. Take for example the tutorial, I understand theese 2 steps: bin/nutch admin db -create bin/nutch inject db -dmozfile content.rdf.u8 -subset 3000 but, when I do this: bin/nutch generate db segments what happens? I think that a dir called 'segments' id created, and inside of it I can find the links I have previously injected.Ok.Next steps: bin/nutch fetch $s1bin/nutch updatedb db $s1 Ok, no problems here. But now I cannot understood what happens with this command:bin/nutch generate db segments it is the same command of above, but now I've not injected anything in the DB, it only contais the pages I've previously fetched. So, does it mean that when I generate a segment, it will automagically be filled with links found in fetched pages? And where theese links are saved? And who saves theese link? Thank you so much, this work is really interesting! Menoz
